taoensso / sente

Realtime web comms library for Clojure/Script
https://www.taoensso.com/sente
Eclipse Public License 1.0
1.74k stars 193 forks source link

H12 Timeouts on Heroku with Sente #56

Closed sritchie closed 10 years ago

sritchie commented 10 years ago

Hey @ptaoussanis,

I recently upgraded to Sente 0.14.1 (from 0.12.0), and, suspiciously, started seeing a bunch of H12 - Application Timeouts on my Heroku app.

I've also noticed a number of Idle Connection warnings as well.

Could something have changed between those versions wrt the default keep-alive settings? Also, is there some way to tune these settings? I believe that Heroku has a 30 second timeout on connections, and keep-alives need to be sent within 55 seconds. I'm not sure if these errors were coming from websocket connections or long-polling connections.

Anyway, this is out of my wheelhouse, but it's certainly something that just started after a recent batch of upgrades. Would love your advice! Let me know if I can get you any further info.

sritchie commented 10 years ago

Seems like this websockets library ran into the same issue:

http://stackoverflow.com/questions/21265795/rails-faye-works-for-me-but-still-gives-some-js-error-in-the-console-for-some

sritchie commented 10 years ago

Upgraded from 0.13.0, actually, and upgraded http-kit from "2.1.13" to "2.1.18".

ptaoussanis commented 10 years ago

Hey Sam,

About to head to sleep so will need to look at this properly tomorrow. In the meantime: no timeout changes between v0.13.0 and v0.14.1 that I can remember off-hand but you could try adjust the default timeouts (down) and see if that makes a difference? The client-side make-channel-socket! fn can take opts:

:ws-kalive-ms ; WebSocket keep-alive interval (defaults to 38000)
:lp-kalive-ms ; Ajax (long-polling) keep-alive interval (defaults to 38000)

Cheers!

EDIT Just to clarify: the WebSocket interval will send a small PING iff no other activity has taken place in the window (it's cheap); the Ajax interval will close + re-establish a new long-polling connection (can be a significant cost, but not bad with http-kit).

sritchie commented 10 years ago

okay, nice. Looks like it's actually lp-timeout. I'm going to bring them both down to 40 seconds; that should kill the timeout issue.

Do you know if http-kit has some limit on concurrent connections that's getting saturated by these websocket and long-polling connections? I'm worried that by enabling this feature without properly tuning http-kit I'm hosing my application.

That may be a Heroku thing too. I wonder if Heroku, or NGinx, simply cuts me off when enough concurrent users are on the site.

Anyway, this is a good start! Would love any advice you have when you're up in the AM :)

Thanks @ptaoussanis!

ptaoussanis commented 10 years ago

I'm going to bring them both down to 40 seconds; that should kill the timeout issue.

Oh, to clarify: the keep-alives are in milliseconds so the defaults are both 38 seconds. I'm guessing you'd need to bring them down rather than up? Something like 25000 may be worth trying.

Do you know if http-kit has some limit on concurrent connections

It can take a ton of concurrent connections when configured for it; not sure about the defaults - will check tomorrow. It'll start throwing exceptions if it's over-burdened with the config you're running.

That may be a Heroku thing too. I wonder if Heroku, or NGinx, simply cuts me off when enough concurrent users are on the site.

Hmm - no idea on Heroku, sorry. Nginx won't be a problem (again, with an appropriate config). Roughly how many concurrent users are you looking at?

Here's an older version of http-kit doing 600k concurrent connections on some decent hardware: http://http-kit.org/600k-concurrent-connection-http-kit.html

sritchie commented 10 years ago

Definitely time for me to get some more monitoring in place :) I'm going to dig into this tomorrow. We're moving off of heroku soon, so I shouldn't have to debug that too hard.

ptaoussanis commented 10 years ago

Hey Sam, sorry for the delay getting back to you - should have some time to look at this today.

Any update? Did tweaking the keep-alive interval(s) down solve the problem?

ptaoussanis commented 10 years ago

Okay, have confirmed that Heroku requires a sub-30s timeout:

"After a dyno connection has been established, HTTP requests have an initial 30 second window in which the web process must return response data (either the completed response or some amount of response data to indicate that the process is active). Processes that do not send response data within the initial 30-second window will see an H12 error in their logs."

(From https://devcenter.heroku.com/articles/http-routing#timeouts).

Have a v0.15.1 hotfix ready to go if you can confirm that adjusting your keep-alives solves the issue.

Note that I'm not sure why you only saw this problem when upgrading from v0.13.0 to v0.14.1. The keep-alive values weren't changed, and there's nothing else that changed that'd obviously affect this. Is it possible something change with your Heroku config at the same time you upgraded Sente releases?

sritchie commented 10 years ago

Yeah, I can look into that. What's confusing here is that once a connection is established, heroku only needs to see data every 55 seconds to keep a connection alive- so I thought the defaults would have handled it. Maybe something about a socket connection with NO data after that first handshake? I'll definitely try this fix today.— Sent from Mailbox

On Sat, Jul 19, 2014 at 11:19 PM, Peter Taoussanis notifications@github.com wrote:

Okay, have confirmed that Heroku requires a sub-30s timeout: "After a dyno connection has been established, HTTP requests have an initial 30 second window in which the web process must return response data (either the completed response or some amount of response data to indicate that the process is active). Processes that do not send response data within the initial 30-second window will see an H12 error in their logs." (From https://devcenter.heroku.com/articles/http-routing#timeouts). Have a v0.15.1 hotfix ready to go if you can confirm that adjusting your keep-alives solves the issue.

Note that I'm not sure why you only saw this problem when upgrading from v0.13.0 to v0.14.1. The keep-alive values weren't changed, and there's nothing else that changed that'd obviously affect this. Is it possible something change with your Heroku config at the same time you upgraded Sente releases?

Reply to this email directly or view it on GitHub: https://github.com/ptaoussanis/sente/issues/56#issuecomment-49537528

ptaoussanis commented 10 years ago

Ahh, I think you've identified the point of confusion.

Sente will not send any data over an Ajax connection until an actual payload is ready. Then it'll start a new connection. This means that the 30 second window (not the 55 second window) should apply (as I understand Heroku's docs).

I believe the 55 second limit will apply to things liked chunked/streaming transfers, but isn't applicable to Sente's long polling.

Does that make sense?

sritchie commented 10 years ago

Yup, that makes total sense. I'm pushing up an update to our staging server now; I'll let you know how it looks!

sritchie commented 10 years ago

It looks like this worked! Thanks for all your help, @ptaoussanis. Killer library, killer service. You're an open source assassin.

ptaoussanis commented 10 years ago

You're an open source assassin.

Hah hah, thank you - and going to remember that term ;-)

BTW added a brief comment on the choice to go for long-polling over chunked encoding: https://github.com/ptaoussanis/sente/blob/0532e028ebe3e5d9f829160fe720b4464a2d36a1/src/taoensso/sente.cljx#L51

Have a great day, cheers! :-)