python / pythondotorg

Source code for python.org
https://www.python.org
Apache License 2.0
1.47k stars 587 forks source link

Python Website "downloads page" returns binary data #2411

Open maltfield opened 3 months ago

maltfield commented 3 months ago

Describe the bug

When attempting to curl or wget the downloads page, the web server returns binary data

To Reproduce

Execute either of the following commands in Debian Linux

curl --location 'https://www.python.org/downloads/'
wget 'https://www.python.org/downloads/'

Example execution:

user@disp897:/tmp/tmp.aQ3uHh4PqB$ curl --location 'https://www.python.org/downloads'
Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.
user@disp897:/tmp/tmp.aQ3uHh4PqB$

user@disp897:/tmp/tmp.aQ3uHh4PqB$ wget 'https://www.python.org/downloads/'
--2024-03-15 19:17:59--  https://www.python.org/downloads/
Resolving www.python.org (www.python.org)... 199.232.16.223, 2a04:4e42:41::223
Connecting to www.python.org (www.python.org)|199.232.16.223|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19113 (19K) [text/html]
Saving to: ‘index.html’

index.html          100%[===================>]  18.67K  --.-KB/s    in 0.05s   

2024-03-15 19:18:00 (384 KB/s) - ‘index.html’ saved [19113/19113]

user@disp897:/tmp/tmp.aQ3uHh4PqB$ 

user@disp897:/tmp/tmp.aQ3uHh4PqB$ head -c256 index.html 
�}�r�F����*�CS�5����|�,;�؎'r���M�@$a����o���������'���ƥ�$(R�@�rD��s���ލ�?[�^/m6
                                                                               ����t&l��g���1vD��97���z��s�.�;v_|
                                    �ǰƯ��?m�r&������e=pۓp-�����]���J��u�߭�r��L��h�567��q�vk�r���<�^�\y����mX����:{�yӹ�Bc�O��1x�user@disp897:/tmp/tmp.aQ3uHh4PqB$ 

Expected behavior The pyhon.org webserver(s) should return HTML

maltfield commented 3 months ago

As a workaround, adding the --compressed argument to curl fetches the HTML as-desired

user@disp897:/tmp/tmp.aQ3uHh4PqB$ curl --location --compressed 'https://www.python.org/downloads/' | head
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0<!doctype html>
<!--[if lt IE 7]>   <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9">   <![endif]-->
<!--[if IE 7]>      <html class="no-js ie7 lt-ie8 lt-ie9">          <![endif]-->
<!--[if IE 8]>      <html class="no-js ie8 lt-ie9">                 <![endif]-->
<!--[if gt IE 8]><!--><html class="no-js" lang="en" dir="ltr">  <!--<![endif]-->

<head>
    <!-- Google tag (gtag.js) -->
    <script async src="https://www.googletagmanager.com/gtag/js?id=G-TF35YF9CVH"></script>
    <script>
 41 19113   41  8007    0     0   3748      0  0:00:05  0:00:02  0:00:03  3748
curl: (23) Failure writing output to destination
user@disp897:/tmp/tmp.aQ3uHh4PqB$ 

And setting --compression=gzip in wget is a workaround too

user@disp897:/tmp/tmp.aQ3uHh4PqB$ wget --compression=gzip 'https://www.python.org/downloads/'
--2024-03-15 19:22:27--  https://www.python.org/downloads/
Resolving www.python.org (www.python.org)... 199.232.16.223, 2a04:4e42:41::223
Connecting to www.python.org (www.python.org)|199.232.16.223|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19113 (19K) [text/html]
Saving to: ‘index.html’

index.html          100%[===================>]  18.67K  80.3KB/s    in 0.2s    

2024-03-15 19:22:29 (80.3 KB/s) - ‘index.html’ saved [174854]

user@disp897:/tmp/tmp.aQ3uHh4PqB$ 

user@disp897:/tmp/tmp.aQ3uHh4PqB$ head index.html 
<!doctype html>
<!--[if lt IE 7]>   <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9">   <![endif]-->
<!--[if IE 7]>      <html class="no-js ie7 lt-ie8 lt-ie9">          <![endif]-->
<!--[if IE 8]>      <html class="no-js ie8 lt-ie9">                 <![endif]-->
<!--[if gt IE 8]><!--><html class="no-js" lang="en" dir="ltr">  <!--<![endif]-->

<head>
    <!-- Google tag (gtag.js) -->
    <script async src="https://www.googletagmanager.com/gtag/js?id=G-TF35YF9CVH"></script>
    <script>
user@disp897:/tmp/tmp.aQ3uHh4PqB$