Open akhenakh opened 1 week ago
our current setup is limiting memory usage in Kubernetes to 256M
That seems like a low limit for the go runtime in general. Have you looked into alternate explanations for the high memory usage? like if you run just Caddy without the plugin, or set a very low cache size limit?
We are running 4 caddy instances dedicated to those 2 pmtiles, they are not serving anything else, the four of them periodically are OOMkilled, leading to think this can only be related to the cache
We keep lowering the cache size as you can see we went down to 64 and still got oomkilled, we could experiment with less but it wont solve the issue.
can you reproduce your problem with only caddy serving raw pmtiles files for example, or with nginx used as a reverse proxy in front of go-pmtiles, etc.
My impression is Caddy by default acting as a static file server consumes quite a bit of memory like in https://caddy.community/t/are-there-ways-to-tune-caddy-to-reduce-memory-usage/20533 and maybe you can optimize this by adjusting GC settings but if your goal is to run a complete webserver with SSL termination, etc in less than 256mb of memory then Caddy or another web server written in a garbage collected language may not be the right choice
I am not sure a reworking of the pmtiles directory cache is going to help much, you could for example store compressed directories in the cache but then you will pay in time on every request to decompress it.
Unfortunately we need those tiles to be served using /z/x/y so we can't apply the same workload in production
I am suggesting that you run that as an experiment to confirm that the cache in the go-pmtiles code is the source of the high memory usage, because my null hypothesis is that it is not but I would be very interested to see confirmation of that being the case so we have a focus to optimize on.
our current setup is limiting memory usage in Kubernetes to 256M
our Caddyfile
We are serving 2 (only) different pmtiles using the same Caddy instance.
yet we are seeing memory consumption being highly volatile, we think it's related to the simplistic cache implementation relying on a basic map. Is there any plan to optimize the cache, if not will you accept any contributions that would replace the cache using an existing implem?