Open benoit74 opened 7 months ago
What's the RAM impact of each of those cached entry? You suspect 421 is too large but how much data is cached for each? Is it a static figure? Is it dynamic (based on usage)?
I know @mgautierfr has already explained this but I don't think it's documented and probably should (in libkiwix wiki?)
I just created a good dashboard to observe all system metrics of a given set of pods (based on a regex of their name + a regex that must not match their names):
Is it a static figure? Is it dynamic (based on usage)?
It is dynamic. When a user open a page in a zim file, we will cache:
ZIM_DIRENTLOOKUPCACHE
dirents at zim file opening. ZIM_DIRENTCACHE
dirents used to find the requested resources (dirents of the resources + dirents used to do binary search)ZIM_CLUSTERCACHE
clusters (clusters of the resources)So the more pages are read, the more we cache things.
All ZIM_*CACHE
are related to libzim and so are per opened zim file.
On top of that, libkiwix it self cache zim readers so we have to multiply all this number by the number of cached readers (up to KIWIX_ARCHIVE_CACHE_SIZE
)
As of today, kiwix-serve cache settings are not customized on library.kiwix.org (and not on dev.library.kiwix.org)
As discussed in https://github.com/kiwix/libkiwix/issues/1025, kiwix-serve is using a significant amount of memory. With current code, we could probably put more control on this memory consumption by customizing some settings explained below
KIWIX_ARCHIVE_CACHE_SIZE
getBookCount_not_protected
(number of local and remote books) ~= 421 todayKIWIX_SEARCHER_CACHE_SIZE
KIWIX_ARCHIVE_CACHE_SIZE
)KIWIX_ARCHIVE_CACHE_SIZE
~= 421 todayZIM_DIRENTCACHE
ZIM_DIRENTLOOKUPCACHE
ZIM_DIRENTCACHE
ZIM_CLUSTERCACHE
My gut feeling is that 412 for
KIWIX_ARCHIVE_CACHE_SIZE
andKIWIX_SEARCHER_CACHE_SIZE
is way too much, I wouldn't assume we open this amount of ZIM every day, but my experience is limited.I suggest that we do a small experiments directly in production on library.kiwix.org (dev.library.kiwix.org is not really pertinent in terms of number of ZIMs + traffic and has known issues):
library-data
, reduce this to 1library-data-expe
, with 1 kiwix-serve container and custom environment variableslibrary-data
service to redirect to both k8s deploymentlibrary-data
to 2 containers andlibrary-data-expe
to 0 et voilàlibrary-data
andlibrary-data-expe
in terms of memory, CPU and Disk/IO ; also note any sensible change in terms of performance on live browsing by an end-user@rgaudin @mgautierfr @kelson42 WDYT?