cvmfs: memory request/limit for containers in the pod

sciencebox / charts

Helm Charts for ScienceBox services

GNU Affero General Public License v3.0

4 stars 6 forks source link

cvmfs: memory request/limit for containers in the pod #116

Closed krishnan-r closed 1 year ago

krishnan-r commented 1 year ago

The daemonset does not specify a request or limit for the containers in the pod. This causes the system to kill processes from this pod when the node is OOM.

We need a way to specify the memory limit for the entire pod, at the same time adjusting cache sizes of individual mount points taking into account the available memory in the node & pod

cc @etejedor

ebocchi commented 1 year ago

Fixed with https://github.com/sciencebox/charts/commit/2a9939a56d2cf37f56bbc15a5cc1584202892f2f

etejedor commented 1 year ago

Thanks @ebocchi for working on this. To be discussed how to set the limits for the in-memory cache of individual mounts -- the changes in https://github.com/sciencebox/charts/commit/2a9939a56d2cf37f56bbc15a5cc1584202892f2f allow for the configuration of requests and limits for the cvmfs daemonset pods, but we still need a way to enforce at the cvmfs level that we never hit those limits.

ebocchi commented 1 year ago

There's no in-memory cache at the moment. The cache is on disk, with traditional posix type. To have in-memory cache, the client should be reconfigured as per https://cvmfs.readthedocs.io/en/latest/cpt-configure.html#example -- I think this was tested on baremetal in the past and ditched later.

One can still experiment with (https://cvmfs.readthedocs.io/en/latest/apx-parameters.html)

CVMFS_OOM_SCORE_ADJ (if effective in containers)
CVMFS_MEMCACHE_SIZE (but this is additional in-memory cache for metadata, not the actual payload)

These can be added as key:value to https://github.com/sciencebox/charts/blob/master/cvmfs/values.yaml#L33-L38 (or more specific per-mount, if needed)

etejedor commented 1 year ago

There's no in-memory cache at the moment.

This puzzles me, since we observe how the memory consumption of the cvmfs pods can grow to several GB (and sometimes cause OOM errors in their node). What is this memory consumption coming from if it's not from caching data?