Closed matkoniecz closed 8 months ago
I think that chart is misleading.
I'm not sure what was going on but there was no sign of increased CPU load or requests over the same time period so I suspect there were some outliers throwing the average off or something. I've restarted php-fpm now which seems to have cleared it.
I don't think it was responsible for any slowness in answering requests though. I think that relates more to the gaps in this chart:
which represent times when a spike in requests causes us to run out of php-fpm workers.
I've blocked a few scraper idiots to see if that helps though I'm not sure if they were really the problem.
I've blocked a few scraper idiots to see if that helps though I'm not sure if they were really the problem.
Maybe some badly written scrapper kept hiding timeouting/expensive page and inflated response time on that graph?
Whatever was measured by https://prometheus.openstreetmap.org/d/HMPwSI2Gz/php-fpm?orgId=1&refresh=1m&viewPanel=10&var-instance=konqi&var-pool=wiki.openstreetmap.org&from=now-1d&to=now got better so I will close this one
Like I say I'm not at all sure that graph is helpful here though I'm not sure what is going on with it.
I don't think I'd call this fixed yet anyway - there was another spike in runners this morning that came close to running out.
I suspect we may need to increase the maximum size of the pool - the spike this morning had at least two people fetch Map_Features at more or less the same time and a single fetch of that can lead to 600 or so loads of images and other resources in a short period of time.
The issues seem to be resolved now.
https://prometheus.openstreetmap.org/d/HMPwSI2Gz/php-fpm?orgId=1&refresh=1m&viewPanel=10&var-instance=konqi&var-pool=wiki.openstreetmap.org&from=now-60d&to=now
Wiki was reported as being really slow by user.
Looking at this graph it seems to be visible also in statistics