sbstp / attohttpc

Rust lightweight HTTP 1.1 client
https://docs.rs/attohttpc/
Mozilla Public License 2.0
258 stars 24 forks source link

"InvalidResponse: invalid status code" error on some websites #95

Open Shnatsel opened 3 years ago

Shnatsel commented 3 years ago

On some websites, e.g. http://tfd.org.tw, attohttpc fails with the following error:

InvalidResponse: invalid status code

Firefox and curl work fine.

15 websites out of the top million from Feb 3 Tranco list are affected.

Tested using this code. Test tool output from all affected websites: atto-invalid-status-code.tar.gz

adamreichold commented 3 years ago

At least http://tfd.org.tw is definitely sending something invalid for us based on the user agent, i.e. we get

GET / HTTP/1.1
connection: close
accept-encoding: gzip, deflate
accept: */*
user-agent: attohttpc/0.16.1
host: tfd.org.tw

<?xml version="1.0" encoding="ISO-8859-1"?>
Date: Sat, 20 Feb 2021 10:38:15 GMT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>Bad request!</title>
<link rev="made" href="mailto:support@f5.com" />
<style type="text/css"><!--/*--><![CDATA[/*><!--*/ 
    body { color: #000000; background-color: #FFFFFF; }
    a:link { color: #0000CC; }
    p, address {margin-left: 3em;}
    span {font-size: smaller;}
/*]]>*/--></style>
<style type="text/css"><!--/*--><![CDATA[/*><!--*/ 
* { width: 400px; font-size: 100%; font-style: normal; }
html { text-align: center; }
body { background: #ffffff; text-align: left; font-family: sans-serif; font-size: 70%; color: #333333; }

a,span { width: auto; } 
h1,h2,h3 { margin: 20px 0px 20px 0px; font-weight: bold; }

h1 { padding: 5px; border: 1px solid #999999; background: #eeeeee; color: #000000; font-size: 125%;  }
hr { height: 1px; border: none; border-top: 1px solid #999999; }
img { border: 0px; }
p { width: 350px; margin: 15px 25px 15px 25px; line-height: 135%; }
/*]]>*/--></style>

</head>

<body>
<h1>Bad request!</h1>
<p>

    Your browser (or proxy) sent a request that
    this server could not understand.

</p>

<h2>Error 400</h2>
<address>
  <a href="/">localhost.localdomain</a><br />

  <span>Sat Feb 20 18:38:15 2021<br />
  </span>
</address>
</body>
</html>

and this is attohttpc but sending curl/7.54.0 as the user agent

GET / HTTP/1.1
connection: close
accept-encoding: gzip, deflate
accept: */*
user-agent: curl/7.54.0
host: tfd.org.tw

HTTP/1.1 403 Forbidden
Connection: close
Content-Type: text/html
Cache-Control: no-cache
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Content-Security-Policy: frame-ancestors
Content-Length: 1443

<!-- IE friendly error message walkround.        
     if error message from server is less than   
     512 bytes IE v5+ will use its own error     
     message instead of the one returned by      
     server.                                 --> 

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><style type="text/css">html,body{height:100%;padding:0;margin:0;}.oc{display:table;width:100%;height:100%;}.ic{display:table-cell;vertical-align:middle;height:100%;}div.msg{display:block;border:1px solid #30c;padding:0;width:500px;font-family:helvetica,sans-serif;margin:10px auto;}h1{font-weight:bold;color:#fff;font-size:14px;margin:0;padding:2px;text-align:center;background: #30c;}p{font-size:12px;margin:15px auto;width:75%;font-family:helvetica,sans-serif;text-align:left;}</style><title>Web Application Firewall</title></head><body><div class="oc"><div class="ic"><div class="msg"><h1>Web Application Firewall</h1><p><p>The transfer has triggered a Web Application Firewall.</p>
<p>
     This transfer is blocked.
</p></p></div></div></div></body></html>

Personally, I see nothing that we can fix about this other than maybe documenting that some servers might yield invalid responses for unknown (and hence seldomly tested) user agents like our default.