openstreetmap / operations

OSMF Operations Working Group issue tracking
https://operations.osmfoundation.org/
98 stars 13 forks source link

There's still a need to bump the memcache size #1107

Open jidanni opened 1 week ago

jidanni commented 1 week ago

Hello. In https://github.com/openstreetmap/openstreetmap-website/issues/2457 I was told to open an issue here. But as it is getting a little over my head, I will just leave this here.

tomhughes commented 6 days ago

There is no evidence at all in the graphs that this in fact an issue. I definitely see the issue that you are referring to but I am unable it as all evidence says it shouldn't be down to memcache.

tbertels commented 1 day ago

Could these sessions disconnections be caused by server restarts or does the server never restart?

mmd-osm commented 1 day ago

I don't see any server restart in the stats, at least for the last 6 months: https://prometheus.openstreetmap.org/d/l4zgNUdMz/memcached?orgId=1&refresh=1m&from=now-6M&to=now

Also, the OP didn't provide any details how frequently they have to log in again. There might be external factors, like cookies being removed by the browser or some browser extension, etc.

jidanni commented 1 day ago

I thought everybody else also has to login again at least once every three or four days. Maybe it's because I use various browsers on various devices. But why on the same device do I need to login again after three or four days? Anyways welcome to check the logs to see why user jidanni has to login again so often.

tbertels commented 14 hours ago

Which stat do you use to check if the server restarted? Aren't these sudden drops in memory usage symptoms of a server restart? Note that the dates are in the format month/day. Copie d'écran_20240707_143049m

mmd-osm commented 10 hours ago

Ah, the link wasn't that helpful. There are about 11 memcached instances overall. However, for the 3 frontend servers, only 3 memcached instances (spike-06 ... spike-08) are relevant. Items in cache and memory usage are fairly stable for these three.

https://prometheus.openstreetmap.org/d/l4zgNUdMz/memcached?orgId=1&refresh=1m&from=now-6M&to=now&var-instance=spike-06&var-instance=spike-07&var-instance=spike-08

I think this should match the following config in chef: https://github.com/openstreetmap/chef/blob/45dc24b65b23a6c1dcc2f0ba2aa971563555c35e/roles/web.rb#L20

tomhughes commented 9 hours ago

A restart would indeed lose all sessions but as @mmd-osm says it's only those three machines that we're talking about here and they last restarted in November last year:

image

At that time it took nearly two months for the caches to fill up which suggests that it should take about that long for things to get expired unless there has been a significant increase in the cache usage since.