SnakeMake best practices

hpc-carpentry / old-hpc-workflows

Scaling studies on high-performance clusters using Snakemake workflows

https://www.hpc-carpentry.org/old-hpc-workflows/

Other

8 stars 2 forks source link

SnakeMake best practices #16

Open reid-a opened 2 years ago

reid-a commented 2 years ago

Snakemake has options to use "profiles", as well as the use of YAML files, to control interaction with clusters, which inform the best practices for running on an HPC cluster. This lesson should examine these best practices with a view to doing this the right way, aligned with the SnakeMake community.

reid-a commented 2 years ago

Related: SnakeMake also comes with a linter, and there are more general best practices in the documentation, which we should review and follow to the degree they make sense.

ocaisa commented 2 years ago

Regarding the use of profiles, this is a bit of a rabbit hole as to get snakemake to query the status of jobs you need a complex setup. This is solved by using cookie cutter profiles (as suggested by @vinisalazar) but that's a complex topic to teach so early in the lesson. It also impacts our portability a bit, since obviously we'll need different profiles for different schedulers.

A good example of why you want this is at https://github.com/snakemake/snakemake/issues/1164

ocaisa commented 2 years ago

See https://github.com/Snakemake-Profiles/slurm for the Slurm profile

ocaisa commented 2 years ago

This looks more promising for our use case https://github.com/jdblischak/smk-simple-slurm

bkmgit commented 2 years ago

Profiles are helpful to avoid typing -c 1 at every invocation. Is Slurm something to emphasize throughout the lesson or enable flexible use both locally and on remote machines? One can also use Terraform to use cloud resources.