jdblischak / smk-simple-slurm

A simple Snakemake profile for Slurm without --cluster-config
Creative Commons Zero v1.0 Universal
120 stars 14 forks source link

Conditional log output? #20

Closed JoshLoecker closed 10 months ago

JoshLoecker commented 11 months ago

Hi, I have a rule in Snakemake that doesn't have any wildcards. Other rules have a wildcards.tissue_name wildcard. When I use the below config.yaml configuration to submit jobs to slurm, I get an error that says the following

RuleException in rule get_screen_genomes in file /lustre/work/helikarlab/joshl/snakemake/Snakefile, line 346:
AttributeError: 'Wildcards' object has no attribute 'tissue_name', when formatting the following:
mkdir -p logs/{rule}/{wildcards.tissue_name} && sbatch --job-name=smk-{rule}-{wildcards} --account=helikarlab --cpus-per-task={threads} --output=logs/{rule}/{wildcards.tissue_name}/{rule}-{wildcards}.out --mem={resources.mem_mb} --time={resources.runtime} --parsable

Is there any way to set a "conditional" --output in the sbatch section? I see the conditional and dynamic resources examples, but it wasn't for the sbatch section

Thanks for any help!

This is my config.yaml

cluster:
  mkdir -p logs/{rule}/{wildcards.tissue_name} &&
  sbatch
    --job-name=smk-{rule}-{wildcards}
    --account=helikarlab
    --cpus-per-task={threads}
    --output=logs/{rule}/{wildcards.tissue_name}/{rule}-{wildcards}.out
    --mem={resources.mem_mb}
    --time={resources.runtime}
    --parsable

cluster-cancel: scancel
cluster-cancel-nargs: 50
restart-times: 0
max-jobs-per-second: 10
max-status-checks-per-second: 1
local-cores: 1
latency-wait: 60
jobs: 100
printshellcmds: True
scheduler: greedy
use-conda: True
conda-frontend: mamba
jdblischak commented 11 months ago

Is there any way to set a "conditional" --output in the sbatch section? I see the conditional and dynamic resources examples, but it wasn't for the sbatch section

@JoshLoecker unfortunately I don't think it's possible to directly do this in the YAML file. That would require using some sort of templating engine (eg jinja2), but as far as I know Snakemake doesn't support this. It simply fills in the values.

You could probably control this in the rules themselves. For those rules that have the wildcard "tissue_name", you could add the following resource:

    resources:
        tissue_dir=lambda wildcards: wildcards.tissue_name

Then in the YAML file, you could set the default resource to be blank:

default-resources:
  - tissue_dir=""

I don't have time to test this, so it might still require some tinkering. Another idea would be to try and use the params field to control this for each rule:

    params:
        tissue_dir="{tissue_name}"

But I'm not sure if params is passed to the YAML file or not. Also this strategy would require setting params on every rule

Please try out these ideas and let me know if any of them work for you

JoshLoecker commented 10 months ago

Sorry for a late response, this is the earliest I've been able to work on this project. Your suggestion works great!

I only have 1 rule that does not have a "tissue_name" wildcard, so I've set the default resources as such

default-resources:
  - tissue_name={wildcards.tissue_name}

And then in the rule that doesn't have a tissue name, I've set the resources as such

resources:
    tissue_name=""

Thank you for your help!

For anyone referencing this in the future, this is my final configuration file

cluster:
  mkdir -p logs/{rule}/{resources.tissue_name} &&
  sbatch
    --job-name=smk-{rule}-{wildcards}
    --account=helikarlab
    --cpus-per-task={threads}
    --output=logs/{rule}/{resources.tissue_name}/{rule}-{wildcards}.out
    --mem={resources.mem_mb}
    --time={resources.runtime}
    --parsable

default-resources:
  - tissue_name={wildcards.tissue_name}

cluster-cancel: scancel
cluster-cancel-nargs: 50
restart-times: 0
max-jobs-per-second: 10
max-status-checks-per-second: 1
local-cores: 1
latency-wait: 60
jobs: 100

printshellcmds: True
scheduler: greedy

use-conda: True
conda-frontend: mamba
jdblischak commented 10 months ago

Sorry for a late response, this is the earliest I've been able to work on this project. Your suggestion works great!

@JoshLoecker I'm glad you found a solution! And thank you very much for sharing here for others to learn

I only have 1 rule that does not have a "tissue_name" wildcard, so I've set the default resources as such

default-resources:
  - tissue_name={wildcards.tissue_name}

Ah, very cool approach. I don't believe I've ever tried to reference a specific wildcard value directly in the YAML, but in retrospect I guess it makes sense that this works