ofek / csi-gcs

Kubernetes CSI driver for Google Cloud Storage
https://ofek.dev/csi-gcs/
Apache License 2.0
152 stars 39 forks source link

Debugging: mount seems successful but no files seen from bucket #155

Open vsoch opened 1 year ago

vsoch commented 1 year ago

Hiya! I have been trying this a few days, and reached a point I thought I'd ask for help. I basically have an operator that is setting up this driver to mount to an existing Google Storage bucket, and everything seems to be working, but when I list the content of the directory (that should be bound) I don't see anything in the storage. I'll try to walk through what I can see carefully so you can help (and maybe this will help me to debug a bit too!).

Bucket

I have files for a Snakemake workflow in the root of a bucket in a subdirectory - I'm assuming that mounting the root of this bucket would allow me to see the subdirectory too? E.g.,

image

and in that directory:

image

Although that's probably not important yet because I can't ls at the root to see the subdirectory. I am wondering if permissions have something to do with it - e.g., I see these options:

image

But I haven't done something like make everything public because the service account associated with the secret I have given Storage Admin and Storage Object Admin roles. Okay - so that's the storage bucket!

Secret

I created the service account with the above permissions, and followed instructions to generate the secret, e.g., a derivative of

# Find the email in the list that I made
$ gcloud iam service-accounts list

# Create the credential file
$ gcloud iam service-accounts keys create <FILE_NAME>.json --iam-account <EMAIL>

# And create a secret from it! I assume this is giving your cluster permission to interact with a specific bucket.

$ kubectl create secret generic csi-gcs-secret --from-literal=bucket=flux-operator-storage --from-file=key=<PATH_TO_SERVICE_ACCOUNT_KEY>

One thing that I wasn't sure about in the instructions is when it says:

Make sure that your Google Cloud Storage service account has roles/cloudkms.cryptoKeyEncrypterDecrypter for the target encryption key.

I added this as one of the roles:

image

but I'm not sure what encryption key this is talking about (and maybe this is the bug?) I couldn't figure out what else I was supposed to do from the getting started guide.

PVC and PV

My PVC and PV look okay? Here are the configs - these are created in Go and I'm outputting the kubectl output in yaml, so some of the settings here are defaults.

  apiVersion: v1
  kind: PersistentVolumeClaim
  metadata:
    annotations:
      gcs.csi.ofek.dev/bucket: <bucket-name>
      gcs.csi.ofek.dev/location: <zone>
      gcs.csi.ofek.dev/project-id: <project-id>
      pv.kubernetes.io/bind-completed: "yes"
      pv.kubernetes.io/bound-by-controller: "yes"
    creationTimestamp: "2023-02-12T02:43:30Z"
    finalizers:
    - kubernetes.io/pvc-protection
    name: data-claim
    namespace: flux-operator
    ownerReferences:
    - apiVersion: flux-framework.org/v1alpha1
      blockOwnerDeletion: true
      controller: true
      kind: MiniCluster
      name: flux-sample
      uid: a46a9433-0849-44f2-b8bb-5eb8b081977f
    resourceVersion: "17252"
    uid: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  spec:
    accessModes:
    - ReadWriteMany
    resources:
      requests:
        storage: 1Ki
    storageClassName: csi-gcs
    volumeMode: Filesystem
    volumeName: data
  status:
    accessModes:
    - ReadWriteMany
    capacity:
      storage: 25Gi
    phase: Bound

What sticks out to me as maybe erroneous is that although I made the capacity 25Gi, the spec resource -> requests is for 1Ki?

    resources:
      requests:
        storage: 1Ki

I'm actually a bit confused about this resource request, because in my code I set this to the same value as the capacity above, which should be 25:

Resources: corev1.ResourceRequirements{
        Requests: corev1.ResourceList{
            corev1.ResourceStorage: resource.MustParse(volume.Capacity),
        },
},

If that is somehow not being set - where do I set it? Is there an annotation I should be using, and regardless, could that be the bug that the resource request is too small?

For my PV, it also looks OK:

  apiVersion: v1
  kind: PersistentVolume
  metadata:
    annotations:
      pv.kubernetes.io/bound-by-controller: "yes"
    creationTimestamp: "2023-02-12T03:05:23Z"
    finalizers:
    - kubernetes.io/pv-protection
    name: data
    ownerReferences:
    - apiVersion: flux-framework.org/v1alpha1
      blockOwnerDeletion: true
      controller: true
      kind: MiniCluster
      name: flux-sample
      uid: xxxxxxxxxxxxxxxxxxxxxxxxxx
    resourceVersion: "28391"
    uid: xxxxxxxxxxxxxxxxxxxxxxxxx
  spec:
    accessModes:
    - ReadWriteMany
    capacity:
      storage: 25Gi
    claimRef:
      apiVersion: v1
      kind: PersistentVolumeClaim
      name: data-claim
      namespace: flux-operator
      resourceVersion: "28348"
      uid: xxxxxxxxxxxxxxxxxxxxxxxxxxx
    csi:
      driver: gcs.csi.ofek.dev
      nodePublishSecretRef:
        name: csi-gcs-secret
        namespace: default
      volumeHandle: csi-gcs
    persistentVolumeReclaimPolicy: Delete
    storageClassName: csi-gcs
    volumeMode: Filesystem
  status:
    phase: Bound

Note that a MiniCluster is a CRD with an indexed job, a few config maps, etc. It's what creates the indexed job. Should that parent attribute be something else?

I can also shell into one of the worker containers (that doesn't exit and fail because it's reliant on the main broker in the indexed job) and I see the volume at /workflow but it's empty.

root@flux-sample-1:/workflow# ls -l /workflow/
total 0

And here they are listed:

$ kubectl get pv -n flux-operator
NAME   CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                      STORAGECLASS   REASON   AGE
data   25Gi       RWX            Delete           Bound    flux-operator/data-claim   csi-gcs                 6m37s

$ kubectl get pvc -n flux-operator
NAME         STATUS   VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-claim   Bound    data     25Gi       RWX            csi-gcs        6m40s

And what a pod (for the indexed job) sees:

Name:             flux-sample-1-2g4fm
Namespace:        flux-operator
Priority:         0
Service Account:  default
Node:             gke-flux-cluster-default-pool-531fb18c-wnzn/10.128.15.194
Start Time:       Sat, 11 Feb 2023 20:05:27 -0700
Labels:           controller-uid=6a25e008-d5cc-45d3-a02b-d54b6aa949d9
                  job-name=flux-sample
                  namespace=flux-operator
Annotations:      batch.kubernetes.io/job-completion-index: 1
                  cni.projectcalico.org/containerID: 5521646c264fab79637f973987e0517db092d4606c28e28102ac1a61beeea7ec
                  cni.projectcalico.org/podIP: 10.116.0.12/32
                  cni.projectcalico.org/podIPs: 10.116.0.12/32
Status:           Running
IP:               10.116.0.12
IPs:
  IP:           10.116.0.12
Controlled By:  Job/flux-sample
Containers:
  flux-sample-0:
    Container ID:  containerd://cbbf313e353ad4b7f04f401ac6ba3a6c109cc1832dc04f75462d3dc977381862
    Image:         ghcr.io/rse-ops/atacseq:app-latest
    Image ID:      ghcr.io/rse-ops/atacseq@sha256:e26d6e9869b040d32aa97212650aaf35726a1563e4eb662829354471ed1ea048
    Port:          5000/TCP
    Host Port:     0/TCP
    Command:
      /bin/bash
      /flux_operator/wait-0.sh
      snakemake --cores 1 --flux
    State:          Running
      Started:      Sat, 11 Feb 2023 20:05:31 -0700
    Ready:          True
    Restart Count:  0
    Environment:
      JOB_COMPLETION_INDEX:   (v1:metadata.annotations['batch.kubernetes.io/job-completion-index'])
    Mounts:
      /etc/flux/config from flux-sample-flux-config (ro)
      /flux_operator/ from flux-sample-entrypoint (ro)
      /mnt/curve/ from flux-sample-curve-mount (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dg5ks (ro)
      /workflow from data (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  flux-sample-flux-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      flux-sample-flux-config
    Optional:  false
  flux-sample-entrypoint:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      flux-sample-entrypoint
    Optional:  false
  flux-sample-curve-mount:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      flux-sample-curve-mount
    Optional:  false
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-claim
    ReadOnly:   false
  kube-api-access-dg5ks:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  7m34s (x2 over 7m36s)  default-scheduler  0/4 nodes are available: 4 pod has unbound immediate PersistentVolumeClaims.
  Normal   Scheduled         7m32s                  default-scheduler  Successfully assigned flux-operator/flux-sample-1-2g4fm to gke-flux-cluster-default-pool-531fb18c-wnzn
  Normal   Pulled            7m28s                  kubelet            Container image "ghcr.io/rse-ops/atacseq:app-latest" already present on machine
  Normal   Created           7m28s                  kubelet            Created container flux-sample-0
  Normal   Started           7m28s                  kubelet            Started container flux-sample-0

Note that mount looks ok (/workflow should be read write from data)!

    Mounts:
      /etc/flux/config from flux-sample-flux-config (ro)
      /flux_operator/ from flux-sample-entrypoint (ro)
      /mnt/curve/ from flux-sample-curve-mount (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dg5ks (ro)
      /workflow from data (rw)

And I know that (from the volume standpoint) there are no errors, because the indexed job runs, and the main issue is that it can't find the data files.

So - I think there might be some issue with either permissions, missing metadata somewhere (perhaps for that weird size?) or something to do with an encryption key that I need an instruction for? Any help you might provide would be greatly appreciated! I've brought up my testing cluster a few times in the last couple of days, and I'm trying to find other examples online, but I've reached the point I'm not sure what to try next (and I hope you have some ideas).

maennchen commented 1 year ago

That sounds like an issue with implicit directories:

vsoch commented 1 year ago

Oh, great! I'll read up and test this out again tomorrow - will post an update (and hopefully be able to close the issue). Thank you!

vsoch commented 1 year ago

Okay this is great - making progress! I changed the working directory to be exactly where the workflow is, and then when I do a listing I see the contents!

The working directory is /workflow/snakemake-workflow, contents include:
Dockerfile  README.md  Snakefile  environment.yaml

And then I got a permissions error (still progress!):

Traceback (most recent call last):
  File "/opt/micromamba/envs/snakemake/bin/snakemake", line 10, in <module>
    sys.exit(main())
  File "/opt/micromamba/envs/snakemake/lib/python3.10/site-packages/snakemake/__init__.py", line 2945, in main
    success = snakemake(
  File "/opt/micromamba/envs/snakemake/lib/python3.10/site-packages/snakemake/__init__.py", line 563, in snakemake
    logger.setup_logfile()
  File "/opt/micromamba/envs/snakemake/lib/python3.10/site-packages/snakemake/logging.py", line 307, in setup_logfile
    os.makedirs(os.path.join(".snakemake", "log"), exist_ok=True)
  File "/opt/micromamba/envs/snakemake/lib/python3.10/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/opt/micromamba/envs/snakemake/lib/python3.10/os.py", line 225, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '.snakemake'

Snakemake is trying to write a directory .snakemake in that present working directory. I tried setting the file/directory mode so anyone could read write, and that didn't work (same error) e.g.,:

# Read/write access for all users
gcs.csi.ofek.dev/dir-mode: "0777"
gcs.csi.ofek.dev/file-mode: "0777"

It looks like root has a strange id (the default that the storage uses)

🔒️ Working directory permissions:
total 3
-rw-rw-r-- 1 root 63147  233 Feb 10 22:57 Dockerfile
-rw-rw-r-- 1 root 63147  347 Feb 10 22:57 README.md
-rw-rw-r-- 1 root 63147 1144 Feb 10 22:57 Snakefile
-rw-rw-r-- 1 root 63147  203 Feb 10 22:57 environment.yaml

Although when I tried to change that to 0 or the user id, the mount didn't work, period, so I won't mess with that for now. So I double checked the user that needs to run the workflow:

uid=1000(flux) gid=1000(flux) groups=1000(flux)

And then tried:

...
        gcs.csi.ofek.dev/gid: "1000"
        gcs.csi.ofek.dev/uid: "1000"
        gcs.csi.ofek.dev/dir-mode: "0755"
        gcs.csi.ofek.dev/file-mode: "0664"

And then based on this issue I decided to try adding the implicit-dirs flag:

        gcs.csi.ofek.dev/gid: "1000"
        gcs.csi.ofek.dev/uid: "1000"
        gcs.csi.ofek.dev/dir-mode: "0755"
        gcs.csi.ofek.dev/file-mode: "0664"
        implicit-dirs: "true"

Neither of those worked - I don't think I'm allowed to change the gid/uid because then the pvc stops working?

  Warning  ProvisioningFailed    47s   gcs.csi.ofek.dev_gke-flux-cluster-default-pool-3f21ee47-pt36_9451e781-d56c-4169-8591-879acc52e19f  failed to provision volume with StorageClass "csi-gcs": rpc error: code = Internal desc = Failed to set bucket capacity: googleapi: Error 403: Access denied., forbidden

Do you have a suggestion for what I should try? In a nutshell, the container starts as root, and we do that for setup of things. THe working directory of the run is the mounted directory. When the workflow is run, it's done by a "flux" user (on behalf by root). So I assume what is happening is that flux doesn't have permission to write there, but I don't totally understand why, because if I set permissions to 0777 for file/directory I'd expect anyone could write there.

Also heads up the "mount options" for fuse at this link is 404 https://ofek.dev/csi-gcs/dynamic_provisioning/#extra-flags.

Update: opened a PR with a quick fix https://github.com/ofek/csi-gcs/pull/156

And I really love being able to define these as annotations! At least for my operator, the user is in control of annotations (in the custom resource definition) and it's nice I don't have to edit / redeploy my operator every time to try something new.

Update: also tried derivations of:

gcs.csi.ofek.dev/fuse-mount-options: "rw,allow_other,file_mode=777,dir_mode=777"
gcs.csi.ofek.dev/fuse-mount-options: "rw,allow_other,file_mode=777,dir_mode=777,uid=1000,gid=1000"

No luck yet, going to bring the cluster down for today and looking forward to hearing your feedback!

vsoch commented 1 year ago

I tried running the workflow as root, and it looks like the permissions issue is gone, but it doesn't see any of the data in the subdirectories (nor does it see the subdirectories). I tried doing an "ls" so it would show up, and I also added impicit-dirs to be true, neither made a difference.

broker.info[0]: quorum-full: quorum->run 0.430339s
Building DAG of jobs...
MissingInputException in rule bwa_map in file /workflow/snakemake-workflow/Snakefile, line 9:
Missing input files for rule bwa_map:
    output: mapped_reads/A.bam
    wildcards: sample=A
    affected files:
        data/samples/A.fastq
        data/genome.fa
maennchen commented 1 year ago

Did you try setting the fsGroup? https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod

vsoch commented 1 year ago

Interesting - I can try that for the latter case (running as root) but I'm afraid if I change it to the flux user, root will no longer be able to write files to the config map locations (root sets things up for the workflow).

vsoch commented 1 year ago

Still no go - I've tried both derivations of having things owned by the flux user and root, and the closest I can get is to have root own / run everything,

🔒️ Working directory permissions:
total 3
-rw-rw-r-- 1 root 63147  233 Feb 10 22:57 Dockerfile
-rw-rw-r-- 1 root 63147  347 Feb 10 22:57 README.md
-rw-rw-r-- 1 root 63147 1144 Feb 10 22:57 Snakefile
-rw-rw-r-- 1 root 63147  203 Feb 10 22:57 environment.yaml

but I'm not actually able to see the subdirectory, it's like it doesn't exist. So the workflow fails.

broker.info[0]: quorum-full: quorum->run 9.18322s
Building DAG of jobs...
MissingInputException in rule bwa_map in file /workflow/snakemake-workflow/Snakefile, line 9:
Missing input files for rule bwa_map:
    output: mapped_reads/A.bam
    wildcards: sample=A
    affected files:
        data/samples/A.fastq
        data/genome.fa
broker.err[0]: rc2.0: flux mini submit -n 1 --quiet --watch snakemake --cores 1 --flux Exited (rc=1) 1.8s

Where can I ask for more help on this?