jdblischak / smk-simple-slurm

A simple Snakemake profile for Slurm without --cluster-config
Creative Commons Zero v1.0 Universal
131 stars 16 forks source link

create log folder based on snakefile name? #12

Closed hudja closed 1 year ago

hudja commented 1 year ago

Hi,

Thank you for the great documentation, it is really helpful! I have one question.

I have many different snakemake workflows, but I am using the same profile for all of them, so they all are logging into the same folder. Is it possible to add sub-folder to the log folder based on snakefile name? So, that it would be easier to understand what rule folder belongs to what snakefile? Smth like:

--output=logs/{SNAKEFILE_NAME}/{rule}/{rule}-{wildcards}-%j.out

so that my logs folder will be organized like: logs/A.snakefile/rule1/ logs/B.snakefile/rule1/

Thanks!

jdblischak commented 1 year ago

I have many different snakemake workflows, but I am using the same profile for all of them, so they all are logging into the same folder.

Could you please help me better understand how you are organizing your projects? The logs are written to the current working directory. Are you executing all of these Snakefiles from the same directory? How do A.snakefile and B.snakefile relate to each other? Are they subsequent steps or are they processing completely different data?

hudja commented 1 year ago

Hi,

I have a master project folder with multiple sub-folders: QCed data, phased data, imputed data, etc. This is a biobank level data and each step takes up to a week to complete. So, I prefer to manually run different snakefiles, rather than put everything in a single pipeline. I start with QCing my data with say qc.snakefile, after it completes I continue with phase.snakefile, and after the completion, I proceed with impute.snakefile, and so on. Each of these steps will generate an output for the next step. My snakefiles do not invoke each other upon completion. I rather check if everything is fine and start next step manually. The data is also processed for multiple genome builds and individual chromosomes separately, which adds more log files. Each of snakefiles has 10-15 steps, so currently my logging folder will have 50-100 rule sub-folders. It would be really helpful to log rules from each snakefile into its own log/ folder, but as the majority of my rules use the same resources, I use a single default profile for all of them, and therefore all logs go into the same log/ folder. My config.yaml is located in the same log folder (log/config.yaml).

Ther perfect solution would be smth like: log/qc/rule1, ..., log/qc/rule10 log/phase/rule1, ..., log/phase/rule10 log/impute/rule1, ..., log/impute/rule10 log/config.yaml

So, that I can run it with snakemake -s snakefile --profile log/config.yaml.

Of course, I can use different profile folders and config.yaml files for each snakefile, but the whole pipeline is made of up to 10 different snakefiles. So, using 10 different profile folders for each of them is less convenient.

Alternatively, I can rename all my rules in all my snakefiles and add respective prefixes to them, e.g. rule qc_rule1, rule phase_rule1, etc., but I thought that maybe there is a simplier workaround, so that I do not need to change my code too much: an additional variable that can be used, whether defined in profile, snakefile itself or as --config log_folder=X option. I hope I did not confuse you too much!

Thank you!

jdblischak commented 1 year ago

Got it. Check out the example I created, shared-logs

$ snakemake --profile shared/ --snakefile qc.snakefile
$ snakemake --profile shared/ --snakefile impute.snakefile
$ snakemake --profile shared/ --snakefile phase.snakefile

$ ls logs/
impute.snakefile  phase.snakefile  qc.snakefile

If you really wanted to remove the file extension .snakefile, you could pipe it to sed, tr, or some other command-line tool.

It may be possible to use --config or --envvars, but I didn't investigate these options since it would require you to remember to include this when you ran snakemake, which is easy to forget. By using workflow.main_snakefile, it will always include the name of the Snakefile in the path for the log files.

jdblischak commented 1 year ago

@hudja Did you get a chance to try out my shared-logs example? Does this work well for your use case? Please let me know if you have any problems implementing it

hudja commented 1 year ago

Yes, sorry for not answering. I thought I cannot for the closed issue. It is working perfectly as expected! Thank you very much!