mdelcorvo / TOSCA

Snakemake workflow for somatic mutation detection without matched normal samples
MIT License
11 stars 2 forks source link

Download issue for reference genome fasta and gtf from ensembl ftp #12

Open nukaemon opened 1 month ago

nukaemon commented 1 month ago

The URL starting with 'ftp' is no longer accessible which causes a problem in get_genome and get_annotation in genome.smk. To specify to use the URL starting with 'http' instead of 'ftp', 'url' param is available in these wrappers. However, the version must be modified from v1.23.3 to newer to get this param work.

rule get_genome:
    output:
        expand("resources/reference_genome/{ref}/homo_sapiens.fasta",ref=config["ref"]["build"])
    params:
+       url="http://ftp.ensembl.org/pub",
        species="homo_sapiens",
        datatype="dna",
        build=config["ref"]["build"],
        release=config["ref"]["release"]
    log:
        outputdir + "logs/ensembl/get_genome.log"
    cache: "omit-software"
    wrapper:
-       "v1.23.3/bio/reference/ensembl-sequence"
+       "v3.13.8/bio/reference/ensembl-sequence"

rule get_annotation:
    output:
       expand("resources/reference_genome/{ref}/homo_sapiens.gtf",ref=config["ref"]["build"])
    params:
+       url="http://ftp.ensembl.org/pub",
        species="homo_sapiens",
        release=config["ref"]["release"] if config["ref"]["release"]=='GRCh38' else 87,
        build=config["ref"]["build"]
    cache: "omit-software"
    wrapper:
-       "v1.23.3/bio/reference/ensembl-annotation"
+       "v3.13.8/bio/reference/ensembl-annotation"
nukaemon commented 1 month ago

get_known_variation also need to be modified as below. Regarding the line that specifies fai file path, it causes TypeError in python due to the change in v3.7.0. To avoid the error, simply put [0] at the end.

rule get_known_variation:
    input:
        # use fai to annotate contig lengths for GATK BQSR
-        fai=expand("resources/reference_genome/{ref}/homo_sapiens.fasta.fai",ref=config["ref"]["build"])
+        fai=expand("resources/reference_genome/{ref}/homo_sapiens.fasta.fai",ref=config["ref"]["build"])[0]
    output:
        vcf=expand("resources/database/{ref}/variation.vcf.gz",ref=config["ref"]["build"])
    params:
+        url="http://ftp.ensembl.org/pub",
        species="homo_sapiens",
        build=config["ref"]["build"],
        release=config["ref"]["release"],
        type="all"
    cache: "omit-software"  # save space and time with between workflow caching (see docs)
    wrapper:
-        "v1.23.3/bio/reference/ensembl-variation"
+        "v3.13.8/bio/reference/ensembl-variation"