skinniderlab / CLM

MIT License
0 stars 0 forks source link

managing multiple snakemake runs in parallel #236

Closed skinnider closed 1 month ago

skinnider commented 1 month ago

I successfully ran an initial job to completion on argo. Now, I'm looking to train and evaluate a fairly large number of language models, including training on different databases and different ways of preprocessing those databases. What's the best way to manage and launch multiple runs of the snakemake pipeline in parallel?

For instance, say I wanted to run the following three jobs:

snakemake --configfile /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=rdkit-metric=tc-optimizer=max_coverage-rarefaction=0.1-enum_factor=10/config.yaml --slurm --jobs 10 --default-resources slurm_partition=main,hoppertest,skinniderlab --latency-wait=30
snakemake --configfile /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=rdkit-metric=tc-optimizer=max_coverage-rarefaction=0.3-enum_factor=10/config.yaml --slurm --jobs 10 --default-resources slurm_partition=main,hoppertest,skinniderlab --latency-wait=30
snakemake --configfile /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=rdkit-metric=tc-optimizer=feature_based-rarefaction=0.5-enum_factor=10/config.yaml --slurm --jobs 10 --default-resources slurm_partition=main,hoppertest,skinniderlab --latency-wait=30

Do you recommend just stringing the individual snakemake commands together with &? Or is there a better way to do this?

vineetbansal commented 1 month ago

Yes - this seems to work fine. I created a simple workflow to experiment with this and it looks like as long as none of the output files of the two workflows overlap, this is a safe thing to do.

If they do overlap though, then there's a problem, since snakemake will map each output filepath to an integer, and create lock files like

.snakemake/locks/0.input.lock
.snakemake/locks/0.output.lock

and fail on the second invocation since it will refuse to create duplicate lock files with the same integer, with a (misleading IMO) message like:

LockException:
Error: Directory cannot be locked. Please make sure that no other Snakemake process is trying to create the same files in the following directory:
...

So it looks like we don't necessarily need to run snakemake from different working folders (as I initially suggested), if the output files are all unique, and what you're doing is correct. In addition to the experiments, this is also corroborated by the discussion here.

skinnider commented 1 month ago

Great! That lines up with what I'm seeing so far. I'm still running two jobs in parallel (with &) and once these are done, I'm planning to scale up to 5-10. I will open a new issue if any issues arise.