snakemake / snakemake-wrappers

This is the development home of the Snakemake wrapper repository, see
https://snakemake-wrappers.readthedocs.io
215 stars 184 forks source link

bwa index and mem potential path bugs #60

Open vsoch opened 4 years ago

vsoch commented 4 years ago

Snakemake version 5.10.0

Describe the bug For each of bwa index and bwa mem the user might put in a prefix that will work for a local job, but fail given a remote. Examples are provided below:

Bwa Mem

For bwa mem, note that the params has an "index" below that is used to specify the path. The first below I don't think will work:

rule map_reads:
    input:
        reads=get_merged,
        idx=rules.bwa_index.output
    output:
        temp("mapped/{sample}.sorted.bam")
    log:
        "logs/bwa_mem/{sample}.log"
    params:
        index="refs/genome",
        extra=get_read_group,
        sort="samtools",
        sort_order="coordinate"
    threads: 8
    wrapper:
        "0.39.0/bio/bwa/mem"

But this would:

rule map_reads:
    input:
        reads=get_merged,
        idx=rules.bwa_index.output
    output:
        temp("mapped/{sample}.sorted.bam")
    log:
        "logs/bwa_mem/{sample}.log"
    params:
        index="snakemake-testing/kim-wxs-varlociraptor/refs/genome",
        extra=get_read_group,
        sort="samtools",
        sort_order="coordinate"
    threads: 8
    wrapper:
        "0.39.0/bio/bwa/mem"

Unlike bwa index (discussed next) I'm not sure we can just remove this one.

Bwa Index

For this prefix, given running on a remote with this recipe:

rule bwa_index:
    input:
        "refs/genome.fasta"
    output:
        multiext("refs/genome", ".amb", ".ann", ".bwt", ".pac", ".sa")
    params:
        prefix="refs/genome"
    log:
        "logs/bwa_index.log"
    resources:
        mem_mb=6000,disk_mb=128000
    benchmark:
        "benchmarks/bwa_index.tsv"
    wrapper:
        "0.45.1/bio/bwa/index"

The error log will report that bwa/genome.pac cannot be found. It's not the inputs or outputs, but rather that prefix is used to determine the path to the file! The correct usage would be:

rule bwa_index:
    input:
        "refs/genome.fasta"
    output:
        multiext("refs/genome", ".amb", ".ann", ".bwt", ".pac", ".sa")
    params:
        prefix="snakemake-testing/kim-wxs-varlociraptor/refs/genome"
    log:
        "logs/bwa_index.log"
    resources:
        mem_mb=6000,disk_mb=128000
    benchmark:
        "benchmarks/bwa_index.tsv"
    wrapper:
        "0.45.1/bio/bwa/index"

But rather we aren't required to specify it, so even better would be to remove it entirely:

rule bwa_index:
    input:
        "refs/genome.fasta"
    output:
        multiext("refs/genome", ".amb", ".ann", ".bwt", ".pac", ".sa")
    log:
        "logs/bwa_index.log"
    resources:
        mem_mb=6000,disk_mb=128000
    benchmark:
        "benchmarks/bwa_index.tsv"
    wrapper:
        "0.45.1/bio/bwa/index"

Of course the user writing the pipeline might not know this, in which case maybe there should be a fix to allow for a prefix specified that would, given a default remote prefix, add it as well?

For both of the above, if I can get started on work to fix the wrappers and then do a PR here, I'd be happy to do that! I'm not sure if there are other cases like this too in the wrappers. Let me know your thoughts.

johanneskoester commented 4 years ago

We will fix this with an additional test in snakemake itself. The freedom to specify a prefix might be important for some scenarios, so I'd rather not like to remove it.

johanneskoester commented 4 years ago

However, keeping this open until we have a proper test for this in the snakemake github action.