Closed fpichon closed 2 years ago
The pod
directive is a little wonky because it is both a k8s
config setting and a process directive. I'll need to check how the options are resolved when provided via both ways.
In the meantime, can you try putting all of your pod options under process
like this:
executor {
name = 'k8s'
queueSize = 5
}
k8s {
storageClaimName = 'nextflow-pvc-raw'
storageMountPath = '/data'
}
process {
pod = [ [volumeClaim: 'nextflow-pvc-ref', mountPath: '/ref' ], [volumeClaim: 'nextflow-pvc-analysis', mountPath: '/out' ], [ nodeSelector: 'nodepool=nextflow-2cpu-7go' ] ]
withLabel: cpu4 {
pod = [nodeSelector : 'nodepool=nextflow-4cpu-15go']
cpus = 4
}
withLabel: cpu16 {
pod = [nodeSelector : 'nodepool=nextflow-16cpu-60go']
cpus = 16
}
}
Hi @bentsherman, thanks for your answer.
When applying your config, three things happen:
nodeSelector
to the processes but, after 2 days, it never launched,pod
definition of each label),k8s
section.The pod
definition in process
thus seems to be ignored or overwritten, which could be logical since I do not have any process without a label in my test. And it seems that pod
defined in withLabel
are never launched.... don't know why.
Hi @fpichon, I finally have some more time to investigate these k8s issues.
So the use of k8s.pod
and process.pod
looks good as far as I can tell. The process-level nodeSelector will simply overwrite the k8s-level nodeSelector.
I'm thinking there might be something wrong with how pod
within a withLabel
is applied. What happens if you remove the withLabel
for a moment?
k8s {
storageClaimName = 'nextflow-pvc-raw'
storageMountPath = '/data'
pod = [ [volumeClaim: 'nextflow-pvc-ref', mountPath: '/ref' ], [volumeClaim: 'nextflow-pvc-analysis', mountPath: '/out' ], [ nodeSelector: 'nodepool=nextflow-2cpu-7go' ] ]
}
process {
pod = [nodeSelector : 'nodepool=nextflow-4cpu-15go']
cpus = 4
}
I was unable to reproduce this error with a minimal example :/
I ran nextflow -c kuberun.config kuberun bentsherman/hello
with this config:
k8s {
pod = [
[nodeSelector: 'kubernetes.io/os=linux']
]
}
process {
withLabel: hello {
tag = { x }
pod = [nodeSelector: 'kubernetes.io/os=foobar']
}
}
In my example, the head pod runs but the worker pods get stuck because they have an invalid node selector.
Hi @bentsherman,
Thanks for having taken the time to investigate the problem. We are creating a new Kubernetes cloud, so I will have a new try next week. I will come back to you with the results as soon as possible.
Hi @bentsherman, Sorry for the long delay. Since we have the new kubernetes cluster, it seems now to work fine. Thus, I do not really know why it was not working before, but its was more likely due to the kubernetes cluster rather than Nextflow. Thank you very much for your support on this issue! :)
Bug report
Expected behavior and actual behavior
I want to run Nextflow in a node with 2 CPUs and the different processes in different nodes depending on the number of CPUs needed. I thus define a pod with nodeSelector in k8s section and other pods with other selectors in different withLabel sections.
I would expect to obtain as many pods as processes running in selected nodes from the desired node pool thanks to nodeSelector. However, only the pod definition present k8s section is taken into account, not the ones in withLabel sections and processes never start.
Steps to reproduce the problem
I thus created labels in the config file to allow to easily set them:
For pod definition, I followed examples from here: https://www.nextflow.io/docs/latest/process.html?highlight=pod#pod and here: https://gitter.im/nextflow-io/nextflow?at=5eea0bec539e566fc93e9977
Program output
Using
kubectl describe pod mypod
, I obtained the output:The last line shows that the nodeSelector is the one from the k8s section, not from withLabel section. The CPUs number however is correctly set (4), but since the node set in k8s only contains 2 CPUs, the process never start. I marked this as a bug, but maybe I am doing something wrong (syntax?) or forgot something somewhere ?
Environment
Additional context
Pipeline is run from a pod with the following command:
~/nextflow kuberun -profile cloud -hub mygit username/nextflow/test_cloud.nf -r main --fastq_dir /data/path/to/Fastq/ --out_dir /out/output_folder/
And an example of Nextflow process definition using label:
I tried many different configurations (and syntaxes), but never succeed that the pod definition in withLabel section was taken into account... Thanks for your help.