tornadoweb / tornado

Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.
http://www.tornadoweb.org/
Apache License 2.0
21.76k stars 5.51k forks source link

CurlAsyncHTTPClient doesn't seem to handle the requests in parallel #3131

Open ghevge opened 2 years ago

ghevge commented 2 years ago

Hi, I'm on tornado 6.1, python 3.9 running on an alpine container 4 CPU with 4 GB of Ram, and I'm seeing a weird behavior when calling a remote service using response = await AsyncHTTPClient().fetch(http_request)

What I'm seeing is that when I flooding my service with requests (~ 250 requests in about 2-3 seconds) that have to call this remote service, the requests seem to be queued, waiting a lot of time for getting the connection or waiting for the response from the remote server, although the remote server is not loaded at all, and should be able to handle all these requests at once.

Is there anything limiting me in processing all these requests in parallel, at the libcurl / pycurl libs levels? Is there any way to force CurlAsyncHTTPClient to execute these request in parallel ?

AsyncHTTPClient is initialized like this:

max_clients = 1000  # increasing this value doesn't seem to have any impact in my system performance
http_client_class = 'tornado.curl_httpclient.CurlAsyncHTTPClient'
AsyncHTTPClient.configure(http_client_class, max_clients=max_clients)

These are the details from some of the longest responses from the remote server:

http://dialog-tool-app-server-envoy:80/dialog-tool/v1/projects/512020/variable-expressions/5e1718fa-8d51-490f-910f-58345517b02c 44155.79ms {'queue': 1.430511474609375e-05, 'namelookup': 0.006512, 'connect': 10.055009, 'appconnect': 0.0, 'pretransfer': 10.055321, 'starttransfer': 42.601884, 'total': 42.601966, 'redirect': 0.0}

http://dialog-tool-app-server-envoy:80/dialog-tool/v1/projects/512020/variable-expressions/9b68dc03-ee31-41c8-b45e-be11b9c41e0a 25419.50ms {'queue': 1.239776611328125e-05, 'namelookup': 0.000131, 'connect': 0.000131, 'appconnect': 0.0, 'pretransfer': 0.000467, 'starttransfer': 10.867779, 'total': 10.867838, 'redirect': 0.0}

The very first processed requests take about 1.5 seconds to complete:

http://dialog-tool-app-server-envoy:80/dialog-tool/v1/projects/512020/variable-expressions/3ee1ecbd-6d73-4ca5-828a-8a8b8e1ce5e9 1543.81ms {'queue': 1.1444091796875e-05, 'namelookup': 5.6e-05, 'connect': 5.6e-05, 'appconnect': 0.0, 'pretransfer': 0.000432, 'starttransfer': 1.542342, 'total': 1.542444, 'redirect': 0.0}",,,,

ploxiln commented 2 years ago

waiting a lot of time for getting the connection or waiting for the response from the remote server

This means that the requests are starting in parallel on the client side, but the server is only handling a single request at a time.

ghevge commented 2 years ago

Thanks for your answer, although I'm not convinced that is the problem. I've checked the backed server ( which runs Java code that exposed a grpc API and has an envoy proxy in front to transcode the http requests coming from tornado) and there once the request hits Envoy the response goes back in less than 100ms, according to the logs.

BTW, just for the records, I've also tried to use the default AsyncHTTPClient (without any class configuration) and in that situation when I flood the system with those requests I get this error across the line. Not even one response back.

DWS GEThttp://dialog-tool-app-server-envoy:80/dialog-tool/v1/projects/512020/variable-expressions/84696ada-4ed9-4c93-84f6-f84a3a24ea23-> HTTPError with no response Timeout while connecting",,,,

bdarnell commented 2 years ago

The presence of a delay just over 10 seconds, appearing at multiple different points in the timing info, suggests to me that something in the client process is doing a blocking operation with a 10 second timeout. This could be something in your client-side code or maybe in curl itself. DNS is a likely culprit - is your version of curl built with asynchronous DNS? (I don't remember how to test for this, I just remember a while back that it wasn't always the default) I'd also look at your ipv4/ipv6 setup. It's possible that ipv6 is broken in such a way that it's failing slowly, and you should either fix it or disable it completely.

Enable asyncio's debug mode (https://docs.python.org/3/library/asyncio-dev.html#debug-mode) to help identify where blocking is occurring.

ghevge commented 2 years ago

I'm using this curl version. asyncDNS seems to be available:

curl 7.80.0 (x86_64-alpine-linux-musl) libcurl/7.80.0 OpenSSL/1.1.1n zlib/1.2.12 brotli/1.0.9 nghttp2/1.46.0
Release-Date: 2021-11-10
Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp smb smbs smtp smtps telnet tftp 
Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTPS-proxy IPv6 Largefile libz NTLM NTLM_WB SSL TLS-SRP UnixSockets

I've run some tests using the container IPs instead of the host names (though excluding DNS from the equation) but I was seeing the same behavior.

I've tried then running to load the upstream service by running curl commands from the container where the tornado app is running, but when doing that, I was not seeing the same behavior I was seeing when sending the requests from Tornado app. What I was seeing in this situation was that curl was able to process 100 requests send at once, in max 1+ second.

So it seems there is something off either in my code, tornado async http client or pylib. Curl and the upstream service are ok.

In my side I see the requests leaving my part of code, and getting into await AsyncHTTPClient().fetch(http_request), in less than a second, from the moment they hit the REST endpoint (judging on the logs that I'm generating). Then the wait begins.

The response is not that big either: json content about 100 chars long. Which content just gets returned to the client. No other operations done on it.