tornadoweb / tornado

Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.
http://www.tornadoweb.org/
Apache License 2.0
21.69k stars 5.5k forks source link

Timeout with HEAD method of AsyncHTTPClient on valid URL #2250

Open fabiopedrosa opened 6 years ago

fabiopedrosa commented 6 years ago

I can't get AsyncHTTPClient to do a simple HEAD request.

Not working:

#!/usr/bin/env python
import logging
from tornado import ioloop
from tornado.web import gen
from tornado.httpclient import AsyncHTTPClient
logging.basicConfig(level=logging.DEBUG)

@gen.coroutine
def test():
    url = "https://r4---sn-8vq54vox2u-apne.googlevideo.com/videoplayback?aitags=133%2C134%2C135%2C136%2C160&sparams=aitags%2Cclen%2Cdur%2Cei%2Cgir%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Ckeepalive%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cpl%2Crequiressl%2Csource%2Cexpire&mime=video%2Fmp4&id=o-AK1r9zLt8iQEN-zyIbOXynTtGcj201h1Qivmb4INkx0Q&itag=135&dur=125.600&lmt=1380999388644463&ip=94.62.195.82&key=yt6&expire=1516640181&clen=2196699&signature=53D81AE57372BADC88F66B6B33EB220E42847E10.B3EAC4C02578FFBAA553A47652BE63312A8DFF7D&ms=au&ei=VcNlWq62DtilWOP7m8AB&mv=m&mt=1516618492&ipbits=0&mn=sn-8vq54vox2u-apne&mm=31&requiressl=yes&keepalive=yes&pl=16&source=youtube&gir=yes&initcwndbps=940000&ratebypass=yes"
    client = AsyncHTTPClient()
    try:
        response = yield client.fetch(url, method="HEAD", validate_cert=False)
        print response.status
        print response.headers
    except:
        logging.exception("error")

    ioloop.IOLoop.current().stop()

if __name__ == '__main__':
    test()
    ioloop.IOLoop.current().start()

working just fine:

import requests
url = "https://r4---sn-8vq54vox2u-apne.googlevideo.com/videoplayback?aitags=133%2C134%2C135%2C136%2C160&sparams=aitags%2Cclen%2Cdur%2Cei%2Cgir%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Ckeepalive%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cpl%2Crequiressl%2Csource%2Cexpire&mime=video%2Fmp4&id=o-AK1r9zLt8iQEN-zyIbOXynTtGcj201h1Qivmb4INkx0Q&itag=135&dur=125.600&lmt=1380999388644463&ip=94.62.195.82&key=yt6&expire=1516640181&clen=2196699&signature=53D81AE57372BADC88F66B6B33EB220E42847E10.B3EAC4C02578FFBAA553A47652BE63312A8DFF7D&ms=au&ei=VcNlWq62DtilWOP7m8AB&mv=m&mt=1516618492&ipbits=0&mn=sn-8vq54vox2u-apne&mm=31&requiressl=yes&keepalive=yes&pl=16&source=youtube&gir=yes&initcwndbps=940000&ratebypass=yes"
r = requests.request('HEAD', url)
print r.headers
bdarnell commented 6 years ago

When I try that URL, I don't get a hang, but I get a 403 forbidden. Maybe the link is expired, or it's locked to your IP (there's an IP parameter in there). HEAD requests to other URLs are working for me, so you'll need to help with debugging to figure out what's going on.

The output of curl -v -X HEAD $URL is probably the most useful thing here.

Why do you want to use HEAD? I've found it's generally poorly supported these days. Can If-Modified-Since, If-None-Match, or Range do what you want?

fabiopedrosa commented 6 years ago

@bdarnell I just updated the URL in the samples above to a valid one, can you check again please?

I also made sure to check if the request works fine in other IP ranges.

As this are video streams, using HEAD to check their content-size would be quite important. The HEAD method works fine using curl:

curl -v --head "https://r4---sn-8vq54vox2u-apne.googlevideo.com/videoplayback?aitags=133%2C134%2C135%2C136%2C160&sparams=aitags%2Cclen%2Cdur%2Cei%2Cgir%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Ckeepalive%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cpl%2Crequiressl%2Csource%2Cexpire&mime=video%2Fmp4&id=o-AK1r9zLt8iQEN-zyIbOXynTtGcj201h1Qivmb4INkx0Q&itag=135&dur=125.600&lmt=1380999388644463&ip=94.62.195.82&key=yt6&expire=1516640181&clen=2196699&signature=53D81AE57372BADC88F66B6B33EB220E42847E10.B3EAC4C02578FFBAA553A47652BE63312A8DFF7D&ms=au&ei=VcNlWq62DtilWOP7m8AB&mv=m&mt=1516618492&ipbits=0&mn=sn-8vq54vox2u-apne&mm=31&requiressl=yes&keepalive=yes&pl=16&source=youtube&gir=yes&initcwndbps=940000&ratebypass=yes"
* timeout on name lookup is not supported
*   Trying 213.30.5.15...
* TCP_NODELAY set
* Connected to r4---sn-8vq54vox2u-apne.googlevideo.com (213.30.5.15) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: C:/Program Files/Git/mingw64/ssl/certs/ca-bundle.crt
  CApath: none
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: C=US; ST=California; L=Mountain View; O=Google Inc; CN=*.googlevideo.com
*  start date: Jan  9 08:46:00 2018 GMT
*  expire date: Apr  3 08:46:00 2018 GMT
*  subjectAltName: host "r4---sn-8vq54vox2u-apne.googlevideo.com" matched cert's "*.googlevideo.com"
*  issuer: C=US; O=Google Inc; CN=Google Internet Authority G2
*  SSL certificate verify ok.
> HEAD /videoplayback?aitags=133%2C134%2C135%2C136%2C160&sparams=aitags%2Cclen%2Cdur%2Cei%2Cgir%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Ckeepalive%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cpl%2Crequiressl%2Csource%2Cexpire&mime=video%2Fmp4&id=o-AK1r9zLt8iQEN-zyIbOXynTtGcj201h1Qivmb4INkx0Q&itag=135&dur=125.600&lmt=1380999388644463&ip=94.62.195.82&key=yt6&expire=1516640181&clen=2196699&signature=53D81AE57372BADC88F66B6B33EB220E42847E10.B3EAC4C02578FFBAA553A47652BE63312A8DFF7D&ms=au&ei=VcNlWq62DtilWOP7m8AB&mv=m&mt=1516618492&ipbits=0&mn=sn-8vq54vox2u-apne&mm=31&requiressl=yes&keepalive=yes&pl=16&source=youtube&gir=yes&initcwndbps=940000&ratebypass=yes HTTP/1.1
> Host: r4---sn-8vq54vox2u-apne.googlevideo.com
> User-Agent: curl/7.54.1
> Accept: */*
>
< HTTP/1.1 302 Found
HTTP/1.1 302 Found
< Last-Modified: Wed, 02 May 2007 10:26:10 GMT
Last-Modified: Wed, 02 May 2007 10:26:10 GMT
< Date: Mon, 22 Jan 2018 11:02:49 GMT
Date: Mon, 22 Jan 2018 11:02:49 GMT
< Expires: Mon, 22 Jan 2018 11:02:49 GMT
Expires: Mon, 22 Jan 2018 11:02:49 GMT
< Cache-Control: private, max-age=900
Cache-Control: private, max-age=900
< Location: https://r3---sn-h5q7rn7s.googlevideo.com/videoplayback?aitags=133%2C134%2C135%2C136%2C160&sparams=aitags,clen,dur,ei,expire,gir,id,initcwndbps,ip,ipbits,itag,keepalive,lmt,mime,mm,mn,ms,mv,pl,requiressl,source&mime=video%2Fmp4&id=o-AK1r9zLt8iQEN-zyIbOXynTtGcj201h1Qivmb4INkx0Q&itag=135&dur=125.600&lmt=1380999388644463&ip=94.62.195.82&key=cms1&expire=1516640181&clen=2196699&signature=0DF10B083D76A60A6D6CB5E4A9717D663B2330C3.38D093E72A10867F66BEBDD9C912195AEA862C7E&ei=VcNlWq62DtilWOP7m8AB&ipbits=0&requiressl=yes&keepalive=yes&pl=16&source=youtube&gir=yes&ratebypass=yes&redirect_counter=1&cm2rm=sn-8vq54vox2u-apne7z&req_id=c3e374616400a3ee&cms_redirect=yes&mm=29&mn=sn-h5q7rn7s&ms=rdu&mt=1516618928&mv=m
Location: https://r3---sn-h5q7rn7s.googlevideo.com/videoplayback?aitags=133%2C134%2C135%2C136%2C160&sparams=aitags,clen,dur,ei,expire,gir,id,initcwndbps,ip,ipbits,itag,keepalive,lmt,mime,mm,mn,ms,mv,pl,requiressl,source&mime=video%2Fmp4&id=o-AK1r9zLt8iQEN-zyIbOXynTtGcj201h1Qivmb4INkx0Q&itag=135&dur=125.600&lmt=1380999388644463&ip=94.62.195.82&key=cms1&expire=1516640181&clen=2196699&signature=0DF10B083D76A60A6D6CB5E4A9717D663B2330C3.38D093E72A10867F66BEBDD9C912195AEA862C7E&ei=VcNlWq62DtilWOP7m8AB&ipbits=0&requiressl=yes&keepalive=yes&pl=16&source=youtube&gir=yes&ratebypass=yes&redirect_counter=1&cm2rm=sn-8vq54vox2u-apne7z&req_id=c3e374616400a3ee&cms_redirect=yes&mm=29&mn=sn-h5q7rn7s&ms=rdu&mt=1516618928&mv=m
< Content-Length: 0
Content-Length: 0
< Connection: close
Connection: close
< X-Content-Type-Options: nosniff
X-Content-Type-Options: nosniff
< Content-Type: text/html
Content-Type: text/html
< Server: gvs 1.0
Server: gvs 1.0

<
* Closing connection 0
* TLSv1.2 (OUT), TLS alert, Client hello (1):

Also, changing the client to CurlAsyncHTTPClient fixes this issue for me.

bdarnell commented 6 years ago

Note that that trace doesn't actually give you the size because of the redirect. But that points me in the right direction: there's a bug in redirect following with HEAD. We switch from HEAD to GET while following the redirect:

https://github.com/tornadoweb/tornado/blob/871358d4078889b374758ecc1a8174d4764651d2/tornado/simple_httpclient.py#L488

bdarnell commented 5 years ago

Part of the issue was just fixed in #2440: HEAD requests now stay HEAD when following redirects instead of changing to GET. I'm not sure whether that fixes the hang you were seeing, though.