tornadoweb / tornado

Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.
http://www.tornadoweb.org/
Apache License 2.0
21.72k stars 5.5k forks source link

Tornado not closing keep-alive sockets #1200

Closed williame closed 10 years ago

williame commented 10 years ago

My webserver keeled over recently due to "to many files open". I poked around, and my limit was 1024. I restarted the server and upped the ulimit, but I still wondered how the webserver had used up 1K of descriptors; its not a busy site.

I am running it as non-root and using an iptable rule to give it traffic from port 80. Standard stuff.

The server has been running a few days now, under very light load; a few browser visits per hour.

I have just run lsof on it and it has loads of TCP connections open! They are in sets of 6, which is how many concurrent sockets a browser normally opens against a site.

I will try and anonymize a bit of lsof output:

python  12448  wil   80u  IPv4 3570570303      0t0        TCP box:3456->ip-123.123.221.221:55550 (ESTABLISHED)
python  12448  wil   81u  IPv4 3570570701      0t0        TCP box:3456->ip-123.123.221.221:55552 (ESTABLISHED)
python  12448  wil   82u  IPv4 3570570713      0t0        TCP box:3456->ip-123.123.221.221:55554 (ESTABLISHED)
python  12448  wil   83u  IPv4 3570570739      0t0        TCP box:3456->ip-123.123.221.221:55556 (ESTABLISHED)
python  12448  wil   84u  IPv4 3570570740      0t0        TCP box:3456->ip-123.123.221.221:55558 (ESTABLISHED)
python  12448  wil   85u  IPv4 3570570741      0t0        TCP box:3456->ip-123.123.221.221:55560 (ESTABLISHED)

I can match up these ip-addresses with my normal webserver logs and see how long ago these were created.

And many of these sockets that are open go back days!

What gives? Why aren't they getting closed? How do you get tornado to close stale sockets?

bdarnell commented 10 years ago

Which version of Tornado are you using? We didn't start closing idle sockets until version 4.0 (with a 1-hour default timeout, settable with the idle_connection_timeout keyword argument to the HTTPServer constructor). Prior to that, it was recommended to run Tornado behind a proxy such as nginx or haproxy to provide better management of end-user connections.

williame commented 10 years ago

Well nowhere close to 4. Oh well.

FWIW 1 hour seems a very strange default; Firefox is I think the browser with the longest timeout on a keep alive session at 5 minutes. Has that changed, or was the 1 hour just because nobody yet runs tornado with heavy direct traffic?

bdarnell commented 10 years ago

It was a fairly arbitrary choice; I don't know how long browsers keep connections open. If the client closes its connection then Tornado should notice immediately; the hour timeout is the longest Tornado will wait for any client who doesn't close its connection. It's conservative because there was no limit before and so we didn't want to cut off connections prematurely for anyone who may be relying on this.

Also note that the default limits for file descriptors are quite low and there's no harm in increasing them significantly for Tornado servers - I generally set the limit to 50K on the servers I'm responsible for.

lxgithub24 commented 5 years ago

It was a fairly arbitrary choice; I don't know how long browsers keep connections open. If the client closes its connection then Tornado should notice immediately; the hour timeout is the longest Tornado will wait for any client who doesn't close its connection. It's conservative because there was no limit before and so we didn't want to cut off connections prematurely for anyone who may be relying on this.

Also note that the default limits for file descriptors are quite low and there's no harm in increasing them significantly for Tornado servers - I generally set the limit to 50K on the servers I'm responsible for.

I have two question: 1.how to set the fd(file descriptors) count? is it a parameter tornado? 2.the idle_connection_timeout default is 3600 second. In my tornado server, i set it to 31536000000(1000 years).

considering the second one, it seems that i don't understand this parameter. i understand it a parameter that if the client don't request for anything after 'idle_connection_timeout' seconds, the server will close the connection directly. in that way, whether the client know that the server has already close the connection if the connection itself is a long connection. i asked for this because my company runs application online, but something beyond my expectation happens. my application runs behind a gateway service(zuul: a springcloud Dalston.SR2 service), and zuul send request to me with a keep alive connection. But sometimes the tornado server can't get the zuul request, when i read the tcp message (use tools: tcpdump and wireshark), i found that the client send request with an old connection that server almost close, so tornado server response the client with tcp.flags.reset==1, and zuul regard this as tornado server timeout. I was confused by this question for as long as a month. if there is something wrong i was doing, tell me. below is some code of my application:

async def main():
    await eureka.register()
    app = tornado.web.Application(handlers=rest_apis_handler, template_path=os.path.join(os.path.dirname(__file__),"cangjie_gateway/utils/templates"))
    server = tornado.httpserver.HTTPServer(app, idle_connection_timeout=31536000000)
    server.listen(port)
    prerequest.smooth_reload(server)
    while True:
        await asyncio.sleep(60)
        await eureka.renew()

it confused me. hope for your reply!! @bdarnell