Closed glmanhtu closed 4 years ago
However, I think Nextflow can do it better by setting the working directory inside the command.run file instead of using workingDir parameter.
Try specifying process.scratch = true
in the config file.
However, I think Nextflow can do it better by setting the working directory inside the command.run file instead of using workingDir parameter.
Try specifying
process.scratch = true
in the config file.
The configuration was already there, you can see it here: https://github.com/glmanhtu/nf-workflows/blob/master/nextflow.config
How are you launching the pipeline?
How are you launching the pipeline? Here is the command:
nextflow kuberun https://github.com/glmanhtu/nf-workflows -v pride-pv-claim:/mnt -profile kubernetes
I feat that the config is not properly propagated (similarly to #1050).
What if you create a nextflow.config
in the launching dir adding process.scratch = true
, then try to execute it again.
I feat that the config is not properly propagated (similarly to #1050).
What if you create a
nextflow.config
in the launching dir addingprocess.scratch = true
, then try to execute it again.
I tried and the result is the same.
I added process.scratch = true
into nextflow.config
nextflowVersion = '1.2+'
process.scratch = true
profiles {
docker {
docker {
enabled = true
}
}
kubernetes {
process.executor = 'k8s'
process.scratch = true
k8s {
debug.yaml = true
pod = [runAsUser: 2801]
}
}
}
Command to run: nextflow kuberun https://github.com/glmanhtu/nf-workflows -v pride-pv-claim:/mnt -profile kubernetes
output:
azorin-ml:ms-crux-id-nf tvu$ nextflow kuberun https://github.com/glmanhtu/nf-workflows -v pride-pv-claim:/mnt -profile kubernetes
Launcher pod spec file: .nextflow.pod.yaml
Pod started: silly-plateau
N E X T F L O W ~ version 19.04.1
Pulling glmanhtu/nf-workflows ...
downloaded from https://github.com/glmanhtu/nf-workflows.git
Launching `glmanhtu/nf-workflows` [silly-plateau] - revision: 51f5cd4284 [master]
[warm up] executor > k8s
executor > k8s (1)
[30/317ad3] process > indexPeptides [ 0%] 0 of 1
executor > k8s (1)
[30/317ad3] process > indexPeptides [ 0%] 0 of 1
ERROR ~ Error executing process > 'indexPeptides'
Caused by:
Process `indexPeptides` terminated for an unknown reason -- Likely it has been terminated by the external system
Command executed:
crux tide-index small-yeast.fasta yeast-index
crux tide-search --compute-sp T --mzid-output T demo.ms2 yeast-index
Command exit status:
-
Command output:
(empty)
Work dir:
/mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
-- Check '.nextflow.log' file for details
executor > k8s (1)
[30/317ad3] process > indexPeptides [100%] 1 of 1, failed: 1 ✘
ERROR ~ Error executing process > 'indexPeptides'
Caused by:
Process `indexPeptides` terminated for an unknown reason -- Likely it has been terminated by the external system
Command executed:
crux tide-index small-yeast.fasta yeast-index
crux tide-search --compute-sp T --mzid-output T demo.ms2 yeast-index
Command exit status:
-
Command output:
(empty)
Work dir:
/mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
-- Check '.nextflow.log' file for details
Pod created: nf-30317ad36174f9e2a08dbb6046d8c3e6 Pod error message:
OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:424: container init caused \"mkdir /mnt/tvu: permission denied\"": unknown
Can you copy here the .command.run
script for that task?
Can you copy here the
.command.run
script for that task?
.command.run
#!/bin/bash
# NEXTFLOW TASK: indexPeptides
set -e
set -u
NXF_DEBUG=${NXF_DEBUG:=0}; [[ $NXF_DEBUG > 1 ]] && set -x
NXF_ENTRY=${1:-nxf_main}
nxf_date() {
local ts=$(date +%s%3N); [[ $ts == *3N ]] && date +%s000 || echo $ts
}
nxf_env() {
echo '============= task environment ============='
env | sort | sed "s/\(.*\)AWS\(.*\)=\(.\{6\}\).*/\1AWS\2=\3xxxxxxxxxxxxx/"
echo '============= task output =================='
}
nxf_kill() {
declare -a children
while read P PP;do
children[$PP]+=" $P"
done < <(ps -e -o pid= -o ppid=)
kill_all() {
[[ $1 != $$ ]] && kill $1 2>/dev/null || true
for i in ${children[$1]:=}; do kill_all $i; done
}
kill_all $1
}
nxf_mktemp() {
local base=${1:-/tmp}
if [[ $(uname) = Darwin ]]; then mktemp -d $base/nxf.XXXXXXXXXX
else TMPDIR="$base" mktemp -d -t nxf.XXXXXXXXXX
fi
}
on_exit() {
exit_status=${nxf_main_ret:=$?}
printf $exit_status > /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6/.exitcode
set +u
[[ "$tee1" ]] && kill $tee1 2>/dev/null
[[ "$tee2" ]] && kill $tee2 2>/dev/null
[[ "$ctmp" ]] && rm -rf $ctmp || true
rm -rf $NXF_SCRATCH || true
exit $exit_status
}
on_term() {
set +e
[[ "$pid" ]] && nxf_kill $pid
}
nxf_launch() {
/bin/bash -ue /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6/.command.sh
}
nxf_stage() {
true
# stage input files
rm -f small-yeast.fasta
rm -f demo.ms2
ln -s /mnt/projects/glmanhtu/nf-workflows/data/small-yeast.fasta small-yeast.fasta
ln -s /mnt/projects/glmanhtu/nf-workflows/data/demo.ms2 demo.ms2
}
nxf_unstage() {
true
cp .command.out /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6/.command.out || true
cp .command.err /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6/.command.err || true
[[ ${nxf_main_ret:=0} != 0 ]] && return
mkdir -p /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6
mkdir -p /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6/crux-output && cp -fRL crux-output/tide-search.target.txt /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6/crux
mkdir -p /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6/crux-output && cp -fRL crux-output/tide-search.decoy.txt /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6/crux-
}
nxf_main() {
trap on_exit EXIT
trap on_term TERM INT USR1 USR2
NXF_SCRATCH="$(set +u; nxf_mktemp $TMPDIR)"
[[ $NXF_DEBUG > 0 ]] && nxf_env
touch /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6/.command.begin
set +u
set -u
[[ $NXF_SCRATCH ]] && echo "nxf-scratch-dir $HOSTNAME:$NXF_SCRATCH" && cd $NXF_SCRATCH
nxf_stage
set +e
local ctmp=$(set +u; nxf_mktemp /dev/shm 2>/dev/null || nxf_mktemp $TMPDIR)
local cout=$ctmp/.command.out; mkfifo $cout
local cerr=$ctmp/.command.err; mkfifo $cerr
tee .command.out < $cout &
tee1=$!
tee .command.err < $cerr >&2 &
tee2=$!
( nxf_launch ) >$cout 2>$cerr &
pid=$!
wait $pid || nxf_main_ret=$?
wait $tee1 $tee2
nxf_unstage
}
$NXF_ENTRY
.command.yaml
apiVersion: v1
kind: Pod
metadata:
name: nf-30317ad36174f9e2a08dbb6046d8c3e6
namespace: default
labels: {app: nextflow, runName: silly-plateau, taskName: indexPeptides, processName: indexPeptides,
sessionId: uuid-600b27a0-298d-4b01-88e8-ea3f68ecf077}
spec:
restartPolicy: Never
containers:
- name: nf-30317ad36174f9e2a08dbb6046d8c3e6
image: omicsdi/crux:latest
command: [/bin/bash, -ue, .command.run]
workingDir: /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6
volumeMounts:
- {name: vol-1, mountPath: /mnt}
securityContext: {runAsUser: 2801}
volumes:
- name: vol-1
persistentVolumeClaim: {claimName: pride-pv-claim}
I guess the problem is when it copies the result from the scratch dir to the shared dir
mkdir -p /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6
mkdir -p /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6/crux-output && cp -fRL crux-output/tide-search.target.txt /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6/crux
mkdir -p /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6/crux-output && cp -fRL crux-output/tide-search.decoy.txt /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6/crux-
Not sure how much I can help here. You need to have that dir writable.
I guess the problem is when it copies the result from the scratch dir to the shared dir
mkdir -p /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6 mkdir -p /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6/crux-output && cp -fRL crux-output/tide-search.target.txt /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6/crux mkdir -p /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6/crux-output && cp -fRL crux-output/tide-search.decoy.txt /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6/crux-
Not sure how much I can help here. You need to have that dir writable.
Actually, the dir is writeable for the user 2801 as I specified. However, I suspect the securityContext is not applied before the workingDir directive, that caused the permission denied. So, if there is a way to remove the workingDir directive then it should be working fine.
Can try to hack that yaml and the .commad.run script to make it work.
Can try to hack that yaml and the .commad.run script to make it work.
I have tried to delete the workingDir directive and update the command from command: [/bin/bash, -ue, .command.run]
to command: [/bin/bash, -ue, /mnt/tvu/work/30/317ad36174f9e2a08dbb6046d8c3e6/.command.run]
and then start it manually by command kubectl apply -f .command.yaml
then it was run successfuly.
azorin-ml:testing tvu$ kubectl log nf-30317ad36174f9e2a08dbb6046d8c3e6
log is DEPRECATED and will be removed in a future version. Use logs instead.
nxf-scratch-dir nf-30317ad36174f9e2a08dbb6046d8c3e6:/tmp/nxf.dXKyOaBpKT
INFO: Beginning tide-index.
INFO: Writing results to output directory 'crux-output'.
INFO: CPU: nf-30317ad36174f9e2a08dbb6046d8c3e6
INFO: Fri Aug 9 13:18:38 UTC 2019
INFO: Running tide-index...
INFO: Writing results to output directory 'yeast-index'.
INFO: Reading small-yeast.fasta and computing unmodified peptides...
INFO: Writing decoy fasta...
INFO: Reading proteins
INFO: Precomputing theoretical spectra...
INFO: Elapsed time: 0.0265 s
INFO: Finished crux tide-index.
INFO: Return Code:0
INFO: Beginning tide-search.
WARNING: The output directory 'crux-output' already exists.
Existing files will not be overwritten.
INFO: CPU: nf-30317ad36174f9e2a08dbb6046d8c3e6
INFO: Fri Aug 9 13:18:38 UTC 2019
INFO: Running tide-search...
INFO: Reading index yeast-index
INFO: Reading spectra file demo.ms2
INFO: Converting demo.ms2 to spectrumrecords format
INFO: Sorting spectra
INFO: Running search
INFO: Elapsed time: 0.375 s
INFO: Finished crux tide-search.
INFO: Return Code:0
But the thing is these files are generated by nextflow so, we can't do it all by our self. If you add some sort of options in nextflow that will do it automatically then it will be great.
That's could be a possible patch, but I can't incorporate it directly in the main code base without more rigorous testing and assessment of pros&cons. I would suggest, you create your own build and test it, then send a PR.
That's could be a possible patch, but I can't incorporate it directly in the main code base without more rigorous testing and assessment of pros&cons. I would suggest, you create your own build and test it, then send a PR.
We found another workaround solution for this problem by setting up the permission to the NFS storage it self.
Leaving it open because it may be useful to handle this on NF side.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
We found another workaround solution for this problem by setting up the permission to the NFS storage it self.
I'm encountering this issue without Kubernetes, but with an NFS mount. I'm curious what NFS permissions glmanhtu changed here.
Bug report
Hello guys. I was trying to run nextflow with Kubernetes and have encountered some sort of permission problem with non-root user. Nextflow was trying to initialise the Pod but got an exception of "container init caused \"mkdir /mnt/tvu: permission denied". Below is the Kubernetes manifest file generated by nextflow:
I have assigned rw permission for user 2801 to the NFS persistent volume. When I removed the workingDir, the pod was started successfully and I was be able to cd into the /mnt/tvu/work/68/b883e7affa9dc5d6d5e721c75b21c4 folder. So, I suspect that the runAsUser and workingDir couldn't get along.
As my opinion, I think this is the issue of Kubernetes it self because if the given user has rw permission on the workingDir then the pod should be able to start successfully. However, I think Nextflow can do it better by setting the working directory inside the command.run file instead of using workingDir parameter.
Expected behavior and actual behavior
Expect the pod to be able to start successfully.
Steps to reproduce the problem
From here, nextflow should show an exception of unknown reason. Using kubectl get pods command to check which pod was created (it's name should be something like nf-5f444e721a0ca7373744d096756fd62a) and then using kubectl describe pod to see the error message
Program output
Environment