nf-core / sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
https://nf-co.re/sarek
MIT License
361 stars 398 forks source link

Fail to annotate with VEP and snpEff (merge tool) #287

Closed yocra3 closed 3 years ago

yocra3 commented 3 years ago

Good morning,

I am testing your sarek pipeline and I am experiencing some problems with the annotation step. Until now, I am just using the test dataset.

After the whole pipeline failed to annotate my variants, I tried to only run the annotation steps, using the commands from annotation.md. First, I downloaded the snpEff and VEP cache files. Then I tried to annotate a VCF (one of the files generated with the test dataset) using different tools. If I annotate with snpEff or VEP alone, the pipeline works without any error:

nextflow run nf-core/sarek -profile docker --step annotate --tools snpEff --input results/VariantCalling/1234N/HaplotypeCallerGVCF/HaplotypeCaller_1234N.g.vcf.gz --genome GRCh37 --snpeff_cache ./references/snpEff/cache/ --vep_cache ./references/vep/cache/ --annotation_cache

nextflow run nf-core/sarek -profile docker --step annotate --tools VEP --input results/VariantCalling/1234N/HaplotypeCallerGVCF/HaplotypeCaller_1234N.g.vcf.gz --genome GRCh37 --snpeff_cache ./references/snpEff/cache/ --vep_cache ./references/vep/cache/ --annotation_cache

However, if I try to use the merge option, Snpeff process fails to execute:

nextflow run nf-core/sarek -profile docker --step annotate --tools merge --input results/VariantCalling/1234N/HaplotypeCallerGVCF/HaplotypeCaller_1234N.g.vcf.gz --genome GRCh37 --snpeff_cache ./references/snpEff/cache/ --vep_cache ./references/vep/cache/ --annotation_cache

When examining the .command.log:

WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
00:00:00        SnpEff version SnpEff 4.3t (build 2017-11-24 10:18), by Pablo Cingolani
00:00:00        Command: 'ann'
00:00:00        Reading configuration file 'snpEff.config'. Genome: 'null'
00:00:00        Reading config file: /home/genetica/Carlos/sarek_test/work/a6/7d025354942ad6cf19a46ce18d28b2/snpEff.config
00:00:00        Reading config file: /opt/conda/envs/nf-core-sarek-2.6.1/share/snpeff-4.3.1t-3/snpEff.config
java.lang.RuntimeException: Property: 'null.genome' not found
        at org.snpeff.interval.Genome.<init>(Genome.java:106)
        at org.snpeff.snpEffect.Config.readGenomeConfig(Config.java:681)
        at org.snpeff.snpEffect.Config.readConfig(Config.java:649)
        at org.snpeff.snpEffect.Config.init(Config.java:480)
        at org.snpeff.snpEffect.Config.<init>(Config.java:117)
        at org.snpeff.SnpEff.loadConfig(SnpEff.java:451)
        at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:1000)
        at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:984)
        at org.snpeff.SnpEff.run(SnpEff.java:1183)
        at org.snpeff.SnpEff.main(SnpEff.java:162)
00:00:01        Logging
00:00:02        Done.

This error is not happening when runing snpEff alone. Do you have any idea of how can this be solved?

Thanks,

maxulysse commented 3 years ago

@yocra3 I can see that you're trying things out. I'm sorry that you're running into issues, but I'll help you out.

I'll try the same command to see if I can reproduce the issue.

Which sarek version did you run?

yocra3 commented 3 years ago

I am using nf-core/sarek v2.6.1.

maxulysse commented 3 years ago

Can you give me the full .command.err and .command.sh from the work directory in which this process failed?

yocra3 commented 3 years ago

Here you have:

.command.err

WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
00:00:00    SnpEff version SnpEff 4.3t (build 2017-11-24 10:18), by Pablo Cingolani
00:00:00    Command: 'ann'
00:00:00    Reading configuration file 'snpEff.config'. Genome: 'null'
00:00:00    Reading config file: /home/genetica/Carlos/sarek_test/work/8e/415451e58c50279088367b8233bbc4/snpEff.config
00:00:00    Reading config file: /opt/conda/envs/nf-core-sarek-2.6.1/share/snpeff-4.3.1t-3/snpEff.config
java.lang.RuntimeException: Property: 'null.genome' not found
    at org.snpeff.interval.Genome.<init>(Genome.java:106)
    at org.snpeff.snpEffect.Config.readGenomeConfig(Config.java:681)
    at org.snpeff.snpEffect.Config.readConfig(Config.java:649)
    at org.snpeff.snpEffect.Config.init(Config.java:480)
    at org.snpeff.snpEffect.Config.<init>(Config.java:117)
    at org.snpeff.SnpEff.loadConfig(SnpEff.java:451)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:1000)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:984)
    at org.snpeff.SnpEff.run(SnpEff.java:1183)
    at org.snpeff.SnpEff.main(SnpEff.java:162)
00:00:01    Logging
00:00:03    Done.

.command.sh

#!/bin/bash -euo pipefail
snpEff -Xmx7g         null         -csvStats HaplotypeCaller_1234N.g_snpEff.csv         -nodownload         -dataDir ${PWD}/cache         -canon         -v         HaplotypeCaller_1234N.g.vcf.gz         > HaplotypeCaller_1234N.g_snpEff.ann.vcf

mv snpEff_summary.html HaplotypeCaller_1234N.g_snpEff.html
maxulysse commented 3 years ago

ok, this is very weird. It seems that the params snpeff_db is not set up when you're using --tools merge. I definitively did not managed to reproduce that. Can you try the dev branch (-r dev) to see if the issue remain?

I'll try to do more digging to understand where this could be coming from.

Other idea would be to try to add this params to the command line --snpeff_db GRCh37.75

maxulysse commented 3 years ago

other idea, can you try --tools snpeff,vep,merge

yocra3 commented 3 years ago

I have tried both things.

  1. Passing --snpeff_db GRCh37.75 runs snpEff successfully but not VEP, so the pipeline fails.
  2. Passing --tools snpeff,vep,merge runs both tools and the merging step, so the pipeline runs successfully.

Therefore, I close the issue.