kubernetes-sigs / blob-csi-driver

Azure Blob Storage CSI driver
Apache License 2.0
123 stars 83 forks source link

Cache Invalidation Issue with Blobfuse2 Mount Using Azure Blob CSI Driver #1657

Open Kaushik-Vaibhav opened 1 day ago

Kaushik-Vaibhav commented 1 day ago

What happened: We are experiencing an issue where changes made to files on the Azure blob container are not reflected in the containers using the Azure Blob CSI Driver with the Blobfuse2 protocol. (they are visible Azure portal)

  1. We are using the Blob CSI driver to mount Azure Blob storage across multiple pods.
  2. One pod writes data into a CSV file, and another service (using a 3rd party SDK) updates the empty files.
  3. Changes made via the SDK are visible in the Azure portal but are not reflected in the mounted filesystem inside the containers. I've tried restarting the pods, adjusting mount options, and using the latest configurations to invalidate the cache, but nothing seems to solve the issue.

What you expected to happen: Changes made via 3rd party sdk, should be vivible across all the mounts on the container in the microk8s cluster, (like they are on the azure portal)

How to reproduce it:

  1. Install csi driver helm install blob-csi-driver blob-csi-driver/blob-csi-driver --set node.enableBlobfuseProxy=true --set controller.replicas=1 --set linux.kubelet="/var/snap/microk8s/common/var/lib/kubelet" --namespace kube-system --version v1.25.0

  2. Create PV `apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/provisioned-by: blob.csi.azure.com name: stage-storage-pv spec: capacity: storage: 10Gi accessModes:

    • ReadWriteMany persistentVolumeReclaimPolicy: Retain storageClassName: "" mountOptions:
    • -o allow_other
    • --file-cache-timeout-in-seconds=0
    • --use-attr-cache=false
    • --invalidate-on-sync=true csi: driver: blob.csi.azure.com volumeHandle: azure-csi-driver-volume-handle volumeAttributes: resourceGroup: RESOURCE_GROUP storageAccount: STORAGE_ACCOUNT containerName: CONTAINER_NAME AzureStorageAuthType: MSI protocol: fuse2 AzureStorageIdentityClientID: "CLIENT_ID"`
  3. Create PVC `apiVersion: v1 kind: PersistentVolumeClaim metadata: name: stage-storage-pvc spec: accessModes:

    • ReadWriteMany resources: requests: storage: 10Gi volumeName: stage-storage-pv storageClassName: ""`

Anything else we need to know?:

delphix@vk-hs-azure-2:~/masking-poc$ kubectl exec -it -n -- df -h Filesystem Size Used Available Use% Mounted on blobfuse2 13.4P 0 13.4P 0% /etc/hyperscale rpool/ROOT/delphix.ENrEl05/root 66.5G 50.4G 16.0G 76% /etc/hosts

Environment: CSI Driver version: v1.25.0 Kubernetes version (use kubectl version): v1.27.4 OS (e.g. from /etc/os-release): "Ubuntu", (20.04.6 LTS) (Focal Fossa)" Kernel (e.g. uname -a): Linux vk-hs-azure-2 5.15.0-1073-dx2024092516-650d167a5-azure #82~20.04.1 SMP Wed Sep 25 16:38:55 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux Install tools: snap (for microk8 installation) Protocol Used: FUSE

andyzhangx commented 1 day ago

have you used -o direct_io mountOption?

https://github.com/Azure/azure-storage-fuse Why am I not able to see the updated contents of file(s), which were updated through means other than Blobfuse2 mount? If your use-case involves updating/uploading file(s) through other means and you wish to see the updated contents on Blobfuse2 mount then you need to disable kernel page-cache. -o direct_io CLI parameter is the option you need to use while mounting. Along with this, set file-cache-timeout=0 and all other libfuse caching parameters should also be set to 0. User shall be aware that disabling kernel cache can result into more calls to Azure Storage which will have cost and performance implications.

Kaushik-Vaibhav commented 18 hours ago

After updating the mountOptions, the file changes are visible to my k8 containers, but I'm observing a delay between when the files are updated (via sdk from a different service), and when the changes are reflected to the containers. Is this expected? Would there be a specific delay (in secs) which will occur before updated files are reflected to the mount on my containers?

PV while setting up the containers

apiVersion: v1
kind: PersistentVolume
metadata:
  name: stage-storage-pv
spec:
  capacity:
    storage: 10Gi  # Specify your desired storage size
  accessModes:
    - ReadWriteMany  # options: ReadWriteMany / ReadOnlyMany
  mountOptions:
    - -o allow_other
    - -o direct_io
    - --file-cache-timeout-in-seconds=0
  storageClassName: ""
  csi:
    driver: blob.csi.azure.com
    volumeHandle: VOLUME_HANDLE
    volumeAttributes:
      containerName: CONTAINER_NAME  # Azure Blob Storage container name (same as provisioned VM)
      storageAccount: STORAGE_ACCOUNT # Azure storage account name
      AzureStorageAuthType: MSI
      resourceGroup: RESOURCE_GRP
      protocol: fuse2

I added these fields as well in the mountOptions but had similar observations:

- --use-attr-cache=false
- --invalidate-on-sync=true