kiwix / operations

Kiwix Kubernetes Cluster
http://charts.k8s.kiwix.org/
5 stars 0 forks source link

Customize kiwix-serve cache settings to limit memory consumption #147

Open benoit74 opened 7 months ago

benoit74 commented 7 months ago

As of today, kiwix-serve cache settings are not customized on library.kiwix.org (and not on dev.library.kiwix.org)

As discussed in https://github.com/kiwix/libkiwix/issues/1025, kiwix-serve is using a significant amount of memory. With current code, we could probably put more control on this memory consumption by customizing some settings explained below

Environment variable Purpose Default value Comment
KIWIX_ARCHIVE_CACHE_SIZE Number of open readers (~ZIM) 10% of getBookCount_not_protected (number of local and remote books) ~= 421 today
KIWIX_SEARCHER_CACHE_SIZE Number of open searcher (which might include readers non accounted for in KIWIX_ARCHIVE_CACHE_SIZE) idem KIWIX_ARCHIVE_CACHE_SIZE ~= 421 today
ZIM_DIRENTCACHE Number of dirent kept in cache per ZIM 512 Probably low impact on memory
ZIM_DIRENTLOOKUPCACHE Idem ZIM_DIRENTCACHE 1024 Probably low impact on memory
ZIM_CLUSTERCACHE Number of cluster kept in cache per ZIM 16

My gut feeling is that 412 for KIWIX_ARCHIVE_CACHE_SIZE and KIWIX_SEARCHER_CACHE_SIZE is way too much, I wouldn't assume we open this amount of ZIM every day, but my experience is limited.

I suggest that we do a small experiments directly in production on library.kiwix.org (dev.library.kiwix.org is not really pertinent in terms of number of ZIMs + traffic and has known issues):

@rgaudin @mgautierfr @kelson42 WDYT?

rgaudin commented 7 months ago

What's the RAM impact of each of those cached entry? You suspect 421 is too large but how much data is cached for each? Is it a static figure? Is it dynamic (based on usage)?

I know @mgautierfr has already explained this but I don't think it's documented and probably should (in libkiwix wiki?)

benoit74 commented 7 months ago

I just created a good dashboard to observe all system metrics of a given set of pods (based on a regex of their name + a regex that must not match their names):

https://kiwixorg.grafana.net/d/eaa1add43ccec1e85a562078cdf77779/7589a2da-9d76-59e4-9c5a-58399ebf4adf?orgId=1&refresh=30s

mgautierfr commented 7 months ago

Is it a static figure? Is it dynamic (based on usage)?

It is dynamic. When a user open a page in a zim file, we will cache:

So the more pages are read, the more we cache things. All ZIM_*CACHE are related to libzim and so are per opened zim file.

On top of that, libkiwix it self cache zim readers so we have to multiply all this number by the number of cached readers (up to KIWIX_ARCHIVE_CACHE_SIZE)