mdelcorvo / TOSCA

Snakemake workflow for somatic mutation detection without matched normal samples
MIT License
11 stars 2 forks source link

genome.smk need to be fixed to specify arbitrary ensembl release number #13

Open nukaemon opened 2 months ago

nukaemon commented 2 months ago

Besides the issue #12, genome.smk need to be fixed at the line obtaining ensembl release number as below. Otherwise it returns 87 whatever the release number is specified in config.yaml, resulting in using on a very old version of gtf file built in 2016-06.

rule get_genome:
    output:
        expand("resources/reference_genome/{ref}/homo_sapiens.fasta",ref=config["ref"]["build"])
    params:
+       url="http://ftp.ensembl.org/pub",
        species="homo_sapiens",
        datatype="dna",
        build=config["ref"]["build"],
        release=config["ref"]["release"]
    log:
        outputdir + "logs/ensembl/get_genome.log"
    cache: "omit-software"
    wrapper:
-       "v1.23.3/bio/reference/ensembl-sequence"
+       "v3.13.8/bio/reference/ensembl-sequence"

rule get_annotation:
    output:
       expand("resources/reference_genome/{ref}/homo_sapiens.gtf",ref=config["ref"]["build"])
    params:
+       url="http://ftp.ensembl.org/pub",
        species="homo_sapiens",
-       release=config["ref"]["release"] if config["ref"]["release"]=='GRCh38' else 87,
+       release=config["ref"]["release"] if config["ref"]["build"]=='GRCh38' else 87,
        build=config["ref"]["build"]
    cache: "omit-software"
    wrapper:
-       "v1.23.3/bio/reference/ensembl-annotation"
+       "v3.13.8/bio/reference/ensembl-annotation"

In addition, snpEff database can be updated from GRCh38.99 to GRCh38.105 if snpEff version is specified to 5.1 in snp_eff.yaml. It seems like GRCh38.105 is the latest ensembl-based SnpEff database prepared for human although the latest ensembl release number as of this moment is 112..