harvardinformatics / snpArcher

Snakemake workflow for highly parallel variant calling designed for ease-of-use in non-model organisms.
MIT License
63 stars 30 forks source link

Remove fastq and bam files #159

Closed max-hence closed 3 months ago

max-hence commented 3 months ago

Dear snpArcher's developpers,

I have one question about your usefull software. I'm working on a small server where I can't store all the fastq at the same time. Because snpArcher seems to run mapping and snp_calling on each samples independently, I was wondering if it would be possible for me to add some lines to get rid of the fastq and bam files after the snp_calling, before merging all samples together ? Or will it breake everything ?

Thank you for your answer.

Max Brault

cademirch commented 3 months ago

Hi Max, yes this is possible using the temp() function in Snakemake to mark output files as temporary. Doing this will instruct Snakemake to delete the marked file(s) once they are no longer required by workflow.

For example you can mark the deduped bams as temporary like so:

# snpArcher/workflow/rules/fastq2bam.smk
rule dedup:
    input:
        unpack(dedup_input)
    output:
        # dedupBam = "results/{refGenome}/bams/{sample}_final.bam",
        # dedupBai = "results/{refGenome}/bams/{sample}_final.bam.bai",
        dedupBam = temp("results/{refGenome}/bams/{sample}_final.bam"), # marked temp
        dedupBai = temp("results/{refGenome}/bams/{sample}_final.bam.bai"), # marked temp
    conda:
        "../envs/sambamba.yml"
    resources:
        threads = resources['dedup']['threads'],
        mem_mb = lambda wildcards, attempt: attempt * resources['dedup']['mem']
    log:
        "logs/{refGenome}/sambamba_dedup/{sample}.txt"
    benchmark:
        "benchmarks/{refGenome}/sambamba_dedup/{sample}.txt"
    shell:
        "sambamba markdup -t {threads} {input.bam} {output.dedupBam} 2> {log}"

You can apply this to any other intermediate output files you might not want to keep from the workflow.

max-hence commented 3 months ago

Hi Cade, I'll try that, Thanks for the quick answer !