Closed peter-fsdx closed 11 months ago
This looks like a typo:
mountPath: "/references, readOnly: true
Although I feel like that would cause a syntax error. In any case, I can't find any issue with the existing logic. The code you referenced is for the k8s.pod
config option but you are using the pod
process directive. The logic for readOnly is implemented in PodOptions, PodSpecBuilder and tested here
Thanks @bentsherman. That is a typo, but one I introduced removing an internal product name from the config. My original config does not have that typo. Just to be sure I introduced this typo and nextflow immediately returned a compilation error and no pods get created on the cluster. Sorry for the messy code issue.
I was under the impression that the K8sConfig is where the pod option model you linked was instantiated, but perhaps I was reaching for a convenient explanation. Any advice on how I can debug further?
If you set k8s.debug.yaml = true
in your config, it will write the pod yaml for each task to the task directory, something like .command.yaml
. Let's try that and see if at least Nextflow is doing the right thing.
Thank for the suggestion. I am seeing a discrepancy between the command.yaml (volumes are labelled as readOnly: true
) and the pod description from the cluster (ReadOnly: false
). Below are the outputs for these two things, again slightly sanitized.
.command.yaml:
apiVersion: v1
kind: Pod
metadata:
name: nf-c981291fa88a8cdec8e993ee0dc5d653
namespace: default
labels: {nextflow.io/processName: GENOTYPE_BY_TILE, nextflow.io/runName: friendly_goldstine,
nextflow.io/sessionId: uuid-60294eb0-35c6-454e-ae27-43ebc94b8aba, nextflow.io/app: nextflow,
nextflow.io/taskName: GENOTYPE_BY_TILE_23}
spec:
restartPolicy: Never
containers:
- name: nf-c981291fa88a8cdec8e993ee0dc5d653
image: wave.seqera.io/wt/30afb4f246ff/tall-hippo-17895/fsdx-docker-dev/python:3.10.9-slim-bullseye
args: [/usr/bin/fusion, bash, /fusion/gs/bucket_name/scratch/nf-scratch/c9/81291fa88a8cdec8e993ee0dc5d653/.command.run]
securityContext: {privileged: true}
env:
- {name: FUSION_WORK, value: /fusion/gs/bucket_name/scratch/nf-scratch/c9/81291fa88a8cdec8e993ee0dc5d653}
- {name: FUSION_TAGS, value: '[.command.*|.exitcode|.fusion.*](nextflow.io/metadata=true),[*](nextflow.io/temporary=true)'}
resources:
requests: {cpu: 1}
volumeMounts:
- {name: vol-95, mountPath: /mnt/references, readOnly: true}
- {name: vol-96, mountPath: /gnomad, readOnly: true}
nodeSelector: {node.kubernetes.io/instance-type: n2-standard-16}
serviceAccountName: default
volumes:
- name: vol-95
persistentVolumeClaim: {claimName: references-pvc}
- name: vol-96
persistentVolumeClaim: {claimName: gnomad-pvc}
kubectl describe po nf-c981291fa88a8cdec8e993ee0dc5d653:
Volumes:
vol-95:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: references-pvc
ReadOnly: false
vol-96:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: gnomad-pvc
ReadOnly: false
To make sure there isn't somethign funky with my cluster that I missed I made pods with the below set to make sure that this volume in question can be mounted as readOnly. It works as expected
persistentVolumeClaim:
claimName: references-pvc
readOnly: true
Okay I see that the readOnly
option is defined in both the volume and pod container specs. I'm surprised this hasn't come up before. I guess we can just set it in both places in Nextflow.
If you set readOnly
in the volume spec but not the pod container spec, is it applied correctly?
It appears as though the volume and pos container specs govern different behavior. According to the stackoverflow the different specs govern the disk attachment and disk mounts respectively for the pod. I didn't realize this and had been fixating on the volume spec.
Looking into this I found a gap in my testing (when testing pods outside of nextflow I was not forcing explicitly to different nodes). When I create pods on different nodes with readOnly set to true for both the pod and volume spec outside of nextflow, I am able to recreate the error I am getting above. It's embarrassing but I have no reason to think this is a nextflow issue instead of an infrastructure issue.
Sorry to take your time and thank you for your help
@bentsherman I can confirm I have set things up incorrectly on my end, and need to reconfigure the infrstructure before I can test this.
It does look like for GCE PD volume mounts specifically both the pod and volume specifications must be readonly https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/readonlymany-disks#use-in-pod
When I have a working test case I will reopen this ticket if needed. Thanks again
No problem, even if it was an infra issue on your side, I think Nextflow might need to set readOnly
in both places to be safe. The GCE PD is a good motivating example. I will draft a PR for this, then you can use it in your testing if you want.
Although, it is possible to mount the same volume multiple times in different ways. If a pod mounts a volume in multiple ways and one of the mounts is read-only, not sure it makes sense to set the volume spec itself to be read-only. Maybe it can only be handled at the infra level.
So wanted to circle back on this. I fixed the problem with the persistent volumes on my end. The current release of nextflow is behaving as expected. This isn't what I would expect fro the GCE documentation but it looks like there are no issues for my use case.
Bug report
Expected behavior and actual behavior
I am submitting nextflow jobs to be run on a GKE kubernetes cluster. I have a persistent volume I want to mount to every pod created by nextflow. This volume is a GCE persistent disk deployed using google
pd.csi.storage.gke.io
driver. Per the documentation I have added a pod directive with a volumeClaim withreadOnly: true
set to my configuration. My expectation was that the volume would be mounted to each pod as a readOnly volume, but instead readOnly is set to false.This only causes a problem when the volume is mounted on multiple nodes (GCE can only mount a PD with write permissions to one VM at a time). The volume mounts to pods on the first node provisioned. However pods on subsequent nodes cannot access the volume because the mount is in use by the first node. Pods on the first node run but pods on the subsequent nodes are stuck in a ContainerCreating state.
Steps to reproduce the problem
My yaml file
Config setting the pod directive. Here is the relevant section from mine
Program output
Pods stuck in ContainerCreating state.
kubectl describe po
returns this relevant sections where the pod has readOnly set to false and a warning in the events that the disk cannot be attach because it is in use by another machineEnvironment
Additional context
I looked through the nextflow source code and think I see the issue. The readOnly option is implemented for when the pod gets created here. But the option is not handled when the PodOptions are created here. The unit tests reflect this.