Closed karel-brinda closed 9 months ago
@leoisl Are you able to reproduce this somehow?
I am not sure, all I could think is a misconfiguration when setting up the slurm profile. I can test on EBI slurm cluster, will do it today and report back
This is the relevant comment:
- Needed to
make download
again (and wait many hours for things to download; not clear how to use more than one thread given the config.yaml file’smax_download_threads
doesn’t affect themake download
’s threads utilized.
Yeah, I don't understand well, but will try to debug. First thing is can we change the submission to the slurm
cluster? Currently we have this: https://github.com/karel-brinda/mof-search/blob/87237f4e2ababc96840066db03a39ec28839452f/Makefile#L98-L104
It has a fixed number of cores (10) which should be manually synchronised with https://github.com/karel-brinda/mof-search/blob/87237f4e2ababc96840066db03a39ec28839452f/config.yaml#L51
Also submission will fail if a priority
partition does not exist and pipeline will fail if it takes longer than 8 hours.
Would it be ok for the slurm
run to be like the lsf
run, i.e. we tell snakemake
that the executor is slurm
and let it manage the jobs?
Slurm is currently not the issue, and we can fix Slurm submission system later. It's just one example, where I can imagine this phenomenon might be theoretically encountered (and not with this submission script – someone would have to submit it with only 1 core).
If the user ran make cluster_slurm
and did not change the config.yaml
, what we would have is actually the opposite effect - mof-search
would use more cores than the 10
cores given to the slurm
job because by default threads: all
in config.yaml
- snakemake
would not limit itself to 10 cores only, but would use all cores in the worker node, which is a bug with make cluster_slurm
. I can't actually see how we could be limited to a single download thread if in config.yaml
threads
is not 1
or max_download_threads
is not 1
.
I think we don't have enough information to properly debug this - we would need the command ran and the full config.yaml
...
What happens if you run snakemake -j8
on a computer with only 1 CPU? Will Snakemake use 8 threads or just 1?
8
This is my test:
Snakefile:
rule all:
input:
[f"{i}.txt" for i in range(1000)]
rule touch:
output: "{i}.txt"
shell: "sleep 100000; touch {output}"
Command (this is ran on my local laptop, which has 8 cores, but I tell snakemake to use 1k cores):
snakemake -j 1000
1000 touch jobs are running simultaneously:
$ ps aux | grep sleep | grep touch | wc -l
1000
OK; thanks for the test! I will answer we're unable to identify this issue.
Closing it for now. Will reopen this in future if we manage to reproduce this.
@leoisl I've just actually observed exactly the same issue on GenOuest.
Running make download_asms
invokes the following Snakemake command:
snakemake download_asms_batches --cores all --rerun-incomplete --printshellcmds --keep-going --use-conda --resources max_download_threads=80 max_io_heavy_threads=8 max_ram_mb=12288 -j 99999
If the number of allocated SLURM cores is 2, even with 80 pre-specified download threads, --cores all
will cause downscaling to 2 download threads only.
@leoisl Could you please look into how to fix this? I think it shouldn't be hard.
make download
, THREADS
should be changed to the value of MAX_DOWNLOAD_THREADS
MAX_DOWNLOAD_THREADS
to all the other rules, i.e., taking it out of SMK_PARAMS
?MAX_DOWNLOAD_THREADS
inside SMK_PARAMS
might require playing a bit with different types of Make assignments (=
vs. :=
)
According to one of the reviewers, the number of download threads they could use was always 1, which made the download extremely slow for them.
The only explanation I can think of is that they used some virtual machine with only 1 core or a slurm job with 1 CPU. In such a case, Snakemake doesn't go above #cores I guess.
Is there any way how to fix this? The number of used download threads should be independent of the number of assigned cores.
What do you think @leoisl ?