yandex-cloud / k8s-csi-s3

GeeseFS-based CSI for mounting S3 buckets as PersistentVolumes
Other
549 stars 95 forks source link

Caching support #18

Open nuwang opened 2 years ago

nuwang commented 2 years ago

Hi,

I'm trying to enable caching in the CSI driver. I've passed extra mountOptions as follows:

mountOptions: "--memory-limit 4000 --dir-mode 0777 --file-mode 0666 --cache /tmp --debug --debug_fuse --stat-cache-ttl 9m0s --cache-to-disk-hits 1"

and they are being passed in correctly according to the logs:

I0622 15:05:55.915639 1 mounter.go:65] Mounting fuse with command: geesefs and args: [--endpoint https://s3.ap-southeast-2.amazonaws.com -o allow_other --log-file /dev/stderr --memory-limit 4000 --dir-mode 0777 --file-mode 0666 --cache /tmp --debug --debug_fuse --stat-cache-ttl 9m0s --cache-to-disk-hits 1 biorefdata:galaxy/v1/data.galaxyproject.org /var/lib/kubelet/pods/9d508976-732c-4a3f-8bf6-89bd097e831b/volumes/kubernetes.io~csi/pvc-6a8c3758-8784-4fcc-9311-4305b3cce8e4/mount]

However, the /tmp directory remains empty. Am I doing something wrong?

Also, with multiple pods mounting the same PVC, would the cache work correctly? I can see that there are multiple geesefs processes running, all pointing to the same cache path.

Finally, we want to use this with long-living, entirely read-only data (these are reference genomes and associated read-only data). This is why I set the cache-to-disk-hits to 1, assuming that caused the file to be cached on the very first read. Could you please recommend the best settings for very aggressive caching? I've noticed a lot of S3 calls being made for the same path even though that path for instance, has already been checked recently.

nuwang commented 2 years ago

Following up on this, it looks like this could be a design issue. While on the surface, the use_cache option works with s3fs (caching didn't work with geesefs), I'm not sure that the multiple processes wouldn't step on each other's toes. In CVMFS for example, they appear to be using a single mount per volume, which is then subsequently bind mounted per PVC: https://gitlab.cern.ch/cloud-infrastructure/cvmfs-csi/-/blob/master/pkg/cvmfs/nodeserver.go#L114. That way, there's only one cvmfs process per volume, and therefore, the cache is common. This same issue seems to exist in the ctrox csi driver. Ping @vitalif