nf-cmgg / structural

A bioinformatics best-practice analysis pipeline for calling structural variants (SVs), copy number variants (CNVs) and repeat region expansions (RREs) from short DNA reads
https://nf-cmgg.github.io/structural/
MIT License
18 stars 3 forks source link

MSG: ERROR: Cache directory vep/homo_sapiens not found #78

Closed mvheetve closed 5 months ago

mvheetve commented 5 months ago

Description of the bug

For some reason it can't find the vep cache. I had something similar a while back and then used a custom config to fix this. I'm testing it helps now as well, if so we better fix this before the release. Keep you posted.

Command used and terminal output

## command
nextflow \
    -log ${OUTDIR}/.nextflow.log \
    run nf-cmgg/structural \
    -r dev \
    -work-dir ${WORKDIR} \
    --input ${samplesheet} \
    --outdir ${OUTDIR} \
    -profile vsc_ugent,$SLURM_CLUSTERS \
    --genomes_base ${genomes_base} \
    -resume \
    -latest \
    -c /kyukon/data/gent/vo/000/gvo00082/research/ICT/VAL/vcfanno_custom.config \
    --callers manta,delly,smoove,qdnaseq,wisecondorx,expansionhunter \
    --output_callers \
    --igenomes_ignore true \
    --annotate \
    --vep_version 110.0 \
    --vep_cache_version 110 \
    --annotsv_annotations "${genomes_base}/Hsapiens/GRCh38.p14/variation/AnnotSV/AnnotSV-3.3.4.tar.gz" 

## terminal
Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  Smartmatch is experimental at /usr/local/share/ensembl-vep-110.0-0/modules/Bio/EnsEMBL/VEP/AnnotationSource/File.pm line 472.

  -------------------- EXCEPTION --------------------
  MSG: ERROR: Cache directory vep/homo_sapiens not found

  STACK Bio::EnsEMBL::VEP::CacheDir::dir /usr/local/share/ensembl-vep-110.0-0/modules/Bio/EnsEMBL/VEP/CacheDir.pm:305
  STACK Bio::EnsEMBL::VEP::CacheDir::init /usr/local/share/ensembl-vep-110.0-0/modules/Bio/EnsEMBL/VEP/CacheDir.pm:219
  STACK Bio::EnsEMBL::VEP::CacheDir::new /usr/local/share/ensembl-vep-110.0-0/modules/Bio/EnsEMBL/VEP/CacheDir.pm:111
  STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_from_cache /usr/local/share/ensembl-vep-110.0-0/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:116
  STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all /usr/local/share/ensembl-vep-110.0-0/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:92
  STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources /usr/local/share/ensembl-vep-110.0-0/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:170
  STACK Bio::EnsEMBL::VEP::Runner::init /usr/local/share/ensembl-vep-110.0-0/modules/Bio/EnsEMBL/VEP/Runner.pm:128
  STACK Bio::EnsEMBL::VEP::Runner::run /usr/local/share/ensembl-vep-110.0-0/modules/Bio/EnsEMBL/VEP/Runner.pm:200
  STACK toplevel /usr/local/bin/vep:46
  Date (localtime)    = Sun Mar 31 03:31:07 2024
  Ensembl API version = 110
  ---------------------------------------------------

Relevant files

nextflow_vepError.log

System information

HPC Doduo cluster

nvnieuwk commented 5 months ago

After looking through the working directory I found out that the cache only contains the merged cache homo_sapiens_merged. I'll see if I can write something that will automatically detect this

matthdsm commented 5 months ago

For cmgg, the cache SHOULD be the merged one, otherwise we miss annotations for seqplorer

Op 3 apr 2024 om 10:10 heeft Nicolas Vannieuwkerke @.***> het volgende geschreven:



After looking through the working directory I found out that the cache only contains the merged cache homo_sapiens_merged. I'll see if I can write something that will automatically detect this

— Reply to this email directly, view it on GitHubhttps://github.com/nf-cmgg/structural/issues/78#issuecomment-2033854437, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC2NHEBX335KXM6CPXBEAGLY3O2NXAVCNFSM6AAAAABFTNLQYGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZTHA2TINBTG4. You are receiving this because you are subscribed to this thread.Message ID: @.***>

mvheetve commented 5 months ago

As I thought, I got it to move past the vep problem by adding vep_cache = null to the config. I know documentation says not to include parameters in the config, but I was running with this line in the config for months and I wondered if this would fix it, which it does I think.

Now stuck on a vcf-anno error, but I included custom annotation and the indices are not what they are supposed to be.

nvnieuwk commented 5 months ago

Ah yes the genomes config has a VEP cache in it. That's probably why the wrong one is used. It's indeed better to exclude it with a config :)

@matthdsm this is for structural variants so this will never go into seqplorer :)

nvnieuwk commented 5 months ago

Everything worked perfectly. The vep_cache was populated using a config which pointed to a merged cache