www.osm.org and api.osm.org tune Apache HTTP timeouts

openstreetmap / operations

OSMF Operations Working Group issue tracking

https://operations.osmfoundation.org/

99 stars 12 forks source link

www.osm.org and api.osm.org tune Apache HTTP timeouts #950

Open Firefishy opened 1 year ago

Firefishy commented 1 year ago

Currently both www and api share the same HTTP timeout, the HTTP timeouts should tuned appropriately.

www.openstreetmap.org should have a ~30 seconds HTTP timeout
api.openstreetmap.org should likely retain the 300 seconds HTTP timeout

tomhughes commented 1 year ago

Maybe do some research before suggesting timeouts that exactly match what we already have?

tomhughes commented 1 year ago

Here's the settings if you want proof... https://github.com/openstreetmap/openstreetmap-website/blob/f138055849e5b09cd94797c3e1705c6357d6b93b/config/settings.yml#L48

Firefishy commented 1 year ago

I know the api timeout is 300s. The main issue is www should not be such a long timeout.

tomhughes commented 1 year ago

It isn't, it's 30 seconds, that's what web_timeout is.

It doesn't really help though because once one of them is hogging all the daemons the other one won't get a look in.

None of it is in any way relevant to anything that happened today as far as I know, but I'll know more once I've actually finished getting to the bottom of what happened instead of just guessing and throwing random suggestions in the ring.

Firefishy commented 1 year ago

I have tweaked the ticket title to include Apache. Currently apache's global timeout is set to Timeout 300. Timeout directive sets many underlying apache timeouts if not explicitly set.

Firefishy commented 1 year ago

https://httpd.apache.org/docs/2.4/mod/core.html#timeout

tomhughes commented 1 year ago

We already have the request timeout configured separately in apache via the reqtimeout module.

I'm still investigating what happened but at least in part timeouts were actually making it worse because we wound up with lots of database queries still running after the web requests that initiated them had timed out.