Open CharlesNepote opened 1 year ago
@CharlesNepote the website does those requests right ? we do ping in the app once during setup, and then periodically, but I guess we cache it. @monsieurtanuki @g123k
@CharlesNepote using https://nginx.org/en/docs/http/ngx_http_memcached_module.html could be far more efficient (we already have the memcached server)
It seems to be available with nginx-extras package.
I have done more interesting computations, based on 41 millions of requests from nginx logs.
/api/v0/attribute_groups
.*https://world.openfoodfacts.org/api/v0/preferences and https://world.openfoodfacts.org/api/v0/attribute_groups
This means that the cache would be very efficient.
@alexgarel I don't understand why using memcached would be far more efficient vs vs reading cache from the filesystem. I would also be more complicated. I feel that nginx directives without any dependency are more robust.
@CharlesNepote You are right about the fact that the files will be in cache. So you can implement it that way.
Based on 50 millions nginx log lines analysis, we have found that these URLs represent respectively 2.83% and 2.79% (5.62%) of all requests.
These two files are used to setup preferences. They are generated by Perl, without any database access. See:
It's very easy and efficient to cache them with nginx for a few dozen of seconds (1 minute should be ok, said Stéphane).
We currently (2023-09) have around 3000/6000 requests per minutes. Caching 5.6% of the requests would lead to save around 170/330 req/minute. It would also help in case of peaks.
The nginx conf could be configured this way:
To debug and analyze the cache hits, it's possible to create a temporary specific log: (source: https://serverfault.com/a/912897)
And:
Then it's easy to get some stats about the cache, and verify it is working and efficient: (source: https://serverfault.com/a/912897 )
HIT vs MISS vs BYPASS vs EXPIRED
awk '{print $3}' cache.log | sort | uniq -c | sort -r
MISS URLs:
awk '($3 ~ /MISS/)' cache.log | awk '{print $7}' | sort | uniq -c | sort -r
BYPASS URLs:
awk '($3 ~ /BYPASS/)' cache.log | awk '{print $7}' | sort | uniq -c | sort -r
Part of
5515