Open mhaseeb123 opened 3 years ago
Not sure about your issue. But I don't think maxquant is supporting cross node computing. So maybe check if it runs through without error when using one node only.
@pillepalle123
I checked and workflow seems to be running without errors on one node. But I am still unable to run it on more than one nodes.
With one node, I simply run this:
snakemake --snakefile UltraQuant.sm --configfile config.yaml --cluster "srun --nodes=1 --ntasks=1 --ntasks-per-node=1 --cpus-per-task=24 -t 2:00:00 -o '/home/mhaseeb/ultraquant/UltraQuant/uquant.%j.out' -J 'uqnt_2'" maxQuant -j 24 -k --latency-wait 60 --use-singularity --singularity-args "--bind /oasis/scratch/comet/mhaseeb/temp_project/RAWPXD015890:/oasis/scratch/comet/mhaseeb/temp_project/RAWPXD015890,/home/mhaseeb:/home/mhaseeb,/oasis/scratch/comet/mhaseeb/temp_project:/oasis/scratch/comet/mhaseeb/temp_project" --ri
Here is another log. It seems as if the workflow is not being set up properly and the same process (with same inputs) is executing on multiple nodes causing race conditions - a process deletes or moves a file before the other one causing unhandled file exceptions. See another full log below
Building DAG of jobs...
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 24
Using shell: /bin/bash
Rules claiming more threads will be scaled down.
Provided cores: 24
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 maxQuant
1
Job counts:
count jobs
1 maxQuant
1
Select jobs to execute...
Select jobs to execute...
[Tue Feb 23 11:49:52 2021]
rule maxQuant:
input: out/mqpar.xml
output: out/combined/txt/summary.txt
log: out/logs/maxQuant.txt
jobid: 0
benchmark: out/benchmarks/maxQuant.txt
[Tue Feb 23 11:49:52 2021]
rule maxQuant:
input: out/mqpar.xml
output: out/combined/txt/summary.txt
log: out/logs/maxQuant.txt
jobid: 0
benchmark: out/benchmarks/maxQuant.txt
Activating singularity image /oasis/scratch/comet/mhaseeb/temp_project/RAWPXD015890/work_dir/.snakemake/singularity/79274f8c7291fda81f2362ed0688e4fc.simg
Activating singularity image /oasis/scratch/comet/mhaseeb/temp_project/RAWPXD015890/work_dir/.snakemake/singularity/79274f8c7291fda81f2362ed0688e4fc.simg
Cannot delete folder /oasis/scratch/comet/mhaseeb/temp_project/RAWPXD015890/work_dir/out/combined/proc. Please make sure no other processes are accessing it.
Configuring
[Tue Feb 23 11:49:54 2021]
Error in rule maxQuant:
jobid: 0
output: out/combined/txt/summary.txt
log: out/logs/maxQuant.txt (check log file(s) for error message)
shell:
mono /home/mhaseeb/ultraquant/UltraQuant/MaxQuant/bin/MaxQuantCmd.exe out/mqpar.xml
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Testing files
If anyone knows how to fix this, any help would be highly appreciated :)
Yeah, but that's the thing. Maxquant just doesn't support running in Parallel on several nodes. That's why I believe you can only use one node.
Hi,
I am trying to run the UltraQuant workflow on multiple nodes (in parallel mode - like MPI) of a SLURM based cluster. but unfortunately, it keeps giving me fatal I/O errors (files missing related) when I run it on more than one nodes?
Here is the command that I am running:
I get this error from the SLURM at STDOUT:
While the log file is full of errors looking like the following one:
I tried to explore the temp_dir and the work_dir but it seems as if the
mqpar_conversion
rule is only creating 1 data partition for (n0 and p0) however as per my understanding, it should at least create for 2 nodes (and 24 cores each?) assuming the workflow is designed as per MapReduce-like model?Can me help me get around this issue? Thank you
P.S. I am not experienced with either snakemake or singularity so I am not sure if I am doing something really dumb here.