ngs-docs / 2021-august-remote-computing

Remote computing workshops in August 2021
https://ngs-docs.github.io/2021-august-remote-computing
4 stars 2 forks source link

workshop 9: Automating your analyses with the snakemake workflow system #9

Open ctb opened 3 years ago

ctb commented 3 years ago

Wednesday August 25 from 9 am - 11:30 PDT

Instructors: Abhijna Parigi and Titus Brown Moderator: Marisa Helpers:

Zoom link:

Description:

This two hour workshop will introduce attendees to the snakemake workflow system, for executing large-scale automated analyses.

draft lesson: snakemake for workflows. https://github.com/ngs-docs/2021-GGG298/tree/latest/Week4-snakemake-for-workflows

owner: ???

abhijna commented 3 years ago

I can't assign myself yet, but happy to co-instruct this workshop :)

ctb commented 3 years ago

I cannot lead (may be a bit late to start). Pamela cannot make it.

abhijna commented 3 years ago

I will do the first half and Titus will do the second. I'm thinking I can end before the "Running Salmon quant" section. That way Titus can teach how to add one last rule and then talk about tips, best practices, etc.

Draft lesson is up -- feel free to leave comments here or on the PR!

marisalim commented 3 years ago

took a look at the notes so far, a few ideas :)

abhijna commented 3 years ago

Suggested survey questions:

jeremywalter commented 3 years ago

Pre-Survey: https://forms.gle/1wUXycHSPA4Gajt27 Post-Survey: https://forms.gle/jiSvrdHYUFBQdoBA9

abhijna commented 3 years ago

2021-remote-computing-9pptx.pptx

nick-ulle commented 3 years ago

Drake is called Targets now.

marisalim commented 3 years ago

suggestions for the future :)

abhijna commented 3 years ago

add multiple inputs to lesson

marisalim commented 3 years ago

this is the final snakefile with multiple inputs & expand()

SAMPLES=["ERR458493", "ERR458501", "ERR458494", "ERR458500"]
print('samples are:', SAMPLES)
rule all:
    input:
        expand("{sample}_fastqc.html", sample=SAMPLES),
        "orf_coding.fasta.gz",
        "yeast_orfs",
        expand("{sample}.quant", sample=SAMPLES),

rule make_fastqc:
    input:
        "{sample}.fastq.gz",
    output:
        "{sample}_fastqc.html",
        "{sample}_fastqc.zip"  
    shell:
        "fastqc {input}"

rule download_reference:
    output:
        "orf_coding.fasta.gz"
    shell:
        "curl -L -O https://downloads.yeastgenome.org/sequence/S288C_reference/orf_dna/orf_coding.fasta.gz"

rule index_reference:
    input:
        "orf_coding.fasta.gz"
    output:
        directory("yeast_orfs")
    shell:
        "salmon index --index yeast_orfs --transcripts {input}"

rule salmon_quant:
    input: 
        fastq = "{sample}.fastq.gz",
        index = "yeast_orfs"
    output: 
        directory("{sample}.quant")
    shell:
        "salmon quant -i {input.index} --libType U -r {input.fastq} -o {output} --seqBias --gcBias"