nf-core / raredisease

Call and score variants from WGS/WES of rare disease patients.
https://nf-co.re/raredisease
MIT License
87 stars 34 forks source link

Test+Docker profile fails: Can't locate object method "cdna_coords" #611

Closed Oliversinn closed 2 weeks ago

Oliversinn commented 1 month ago

Description of the bug

ERROR ~ Error executing process > 'NFCORE_RAREDISEASE:RAREDISEASE:ANNOTATE_STRUCTURAL_VARIANTS:ENSEMBLVEP_SV (justhusky)'

Caused by:
  Process `NFCORE_RAREDISEASE:RAREDISEASE:ANNOTATE_STRUCTURAL_VARIANTS:ENSEMBLVEP_SV (justhusky)` terminated with an error exit status (255)

Command executed:

  vep \
      -i justhusky_view.vcf \
      -o justhusky_svdbquery_vep.vcf.gz \
      --dir_cache vep_cache --dir_plugins vep_cache/Plugins --plugin pLI,pLI_values.txt --appris --biotype --buffer_size 100 --canonical --cache --ccds --compress_output bgzip --distance 5000 --domains --exclude_predicted --force_overwrite --format vcf --fork 4 --hgvs --humdiv --max_sv_size 248956422 --merged --no_progress --numbers --per_gene --polyphen p --protein --offline --regulatory --sift p --symbol --tsl --uniprot --vcf \
       \
      --fasta reference.fasta \
      --assembly GRCh37 \
      --species homo_sapiens \
      --cache \
      --cache_version 107 \
      --dir_cache ${PWD}/vep_cache \
      --fork 2

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RAREDISEASE:RAREDISEASE:ANNOTATE_STRUCTURAL_VARIANTS:ENSEMBLVEP_SV":
      ensemblvep: $( echo $(vep --help 2>&1) | sed 's/^.*Versions:.*ensembl-vep : //;s/ .*$//')
  END_VERSIONS

Command exit status:
  255

Command output:
  2024-09-13 21:02:40 - INFO: BAM-edited cache detected, enabling --use_transcript_ref; use --use_given_ref to override this

Command error:
  2024-09-13 21:02:40 - INFO: BAM-edited cache detected, enabling --use_transcript_ref; use --use_given_ref to override this
  Can't locate object method "cdna_coords" via package "Bio::EnsEMBL::Variation::StructuralVariationOverlap" at /usr/local/share/ensembl-vep-112.0-0/Bio/EnsEMBL/Variation/Utils/VariationEffect.pm line 673.
  Died in forked process 2112

Work dir:
  /home/tecniscan/Data/work/08/6ec046f9e7bea0a665d4ba8b2ae23c

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

 -- Check '.nextflow.log' file for details

Command used and terminal output

nextflow run nf-core/raredisease -profile test,docker --outdir results

Relevant files

nextflow.log

System information

Nextflow version: 24.04.4 build 5917 Hardware: Desktop Executor: local Container engine: Docker OS: Linux Mint 22 Version of nf-core/raredisease: v2.2.0-gfa61a65

jemten commented 1 month ago

Thanks for reporting @Oliversinn, looks like an issue with annotating breakpoint variants in VEP, https://github.com/Ensembl/ensembl-vep/issues/1694#issuecomment-2278191833. Looks like it will be patched in the next release of VEP (113). For now the workaround seems to be to remove the --regulatory flag in the ENSEMBLVEP_SV process config. We will test that change and patch the pipeline.

jemten commented 1 month ago

Yes, you would have to remove the --regulatory flag from the ENSEMBLVEP_SV process in this file conf/modules/annotate_structural_variants.configconf/modules/annotate_structural_variants.config even when running with you data,

Oliversinn commented 1 month ago

Now I am getting the following:

-[nf-core/raredisease] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_RAREDISEASE:RAREDISEASE:ANN_CSQ_PLI_SV:ADD_MOST_SEVERE_CSQ (justhusky)'

Caused by:
  Process `NFCORE_RAREDISEASE:RAREDISEASE:ANN_CSQ_PLI_SV:ADD_MOST_SEVERE_CSQ (justhusky)` terminated with an error exit status (1)

Command executed:

  add_most_severe_consequence.py --file_in justhusky_svdbquery_vep.vcf.gz --file_out justhusky_sv_csq_research.vcf --variant_csq variant_consequences_v2.txt

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RAREDISEASE:RAREDISEASE:ANN_CSQ_PLI_SV:ADD_MOST_SEVERE_CSQ":
      add_most_severe_consequence: v1.0
      python: $(python --version | sed 's/Python //g')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "/home/tecniscan/.nextflow/assets/nf-core/raredisease/bin/add_most_severe_consequence.py", line 192, in <module>
      sys.exit(main())
    File "/home/tecniscan/.nextflow/assets/nf-core/raredisease/bin/add_most_severe_consequence.py", line 188, in main
      write_csq_annotated_vcf(in_vcf, out_vcf, var_csq)
    File "/home/tecniscan/.nextflow/assets/nf-core/raredisease/bin/add_most_severe_consequence.py", line 142, in write_csq_annotated_vcf
      mscsq = construct_most_severe_consequence_info(line, allele_ind, csq_ind, hgnc_ind, var_csq)
    File "/home/tecniscan/.nextflow/assets/nf-core/raredisease/bin/add_most_severe_consequence.py", line 75, in construct_most_severe_consequence_info
      transcripts, allele_ind, csq_ind, hgnc_ind, var_csq
  UnboundLocalError: local variable 'transcripts' referenced before assignment

Work dir:
  /home/tecniscan/Data/raredisease_test/work/3f/72d00dc5d134c5022c49a5a54582b9

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

 -- Check '.nextflow.log' file for details

the .command.sh file is

#!/bin/bash -euo pipefail
add_most_severe_consequence.py --file_in justhusky_svdbquery_vep.vcf.gz --file_out justhusky_sv_csq_research.vcf --variant_csq variant_consequences_v2.txt

cat <<-END_VERSIONS > versions.yml
"NFCORE_RAREDISEASE:RAREDISEASE:ANN_CSQ_PLI_SV:ADD_MOST_SEVERE_CSQ":
    add_most_severe_consequence: v1.0
    python: $(python --version | sed 's/Python //g')
END_VERSIONS

And when I manually run it throws:

bin/bash: line 0: /bin/bash: ./.command.sh: Invalid option

I found this possible solution: https://unix.stackexchange.com/questions/533415/invalid-option-name-error-with-shebang-bin-bash-o-pipefail-in-script

The weird thing is that it looks like -euo pipefail (which seems causing the problem) is set up globally for all the scripts, so why is only happening here? If I come out with a solution I'll keep you posted, but nothing so far.

jemten commented 1 month ago

@ramprasadn, do you recognise this error?

ramprasadn commented 1 month ago

Hi @Oliversinn, Did you encounter this error while running the test profile? If so, could you please share the command you used to start Nextflow, as well as the .nextflow.log file from that run?

Oliversinn commented 1 month ago

Yes, running the test profile. But it got it fixed out of nowhere during my weekend tests, the error just stopped coming up. Now I am facing another issue. Is important to mention that I ran the test following the recommendation of

remove the --regulatory flag in the ENSEMBLVEP_SV process config

My command

nextflow run nf-core/raredisease -profile test,docker --outdir results

The error

ERROR ~ Error executing process > 'NFCORE_RAREDISEASE:RAREDISEASE:CALL_SNV:CALL_SNV_DEEPVARIANT:DEEPVARIANT (hugelymodelbat)'

Caused by:
  Missing output file(s) `hugelymodelbat_deepvar.vcf.gz` expected by process `NFCORE_RAREDISEASE:RAREDISEASE:CALL_SNV:CALL_SNV_DEEPVARIANT:DEEPVARIANT (hugelymodelbat)`

Command executed:

  /opt/deepvariant/bin/run_deepvariant \
      --ref=reference.fasta \
      --reads=hugelymodelbat_sorted_md.bam \
      --output_vcf=hugelymodelbat_deepvar.vcf.gz \
      --output_gvcf=hugelymodelbat_deepvar.g.vcf.gz \
      --model_type=WGS --haploid_contigs="X,Y" \
       \
       \
      --intermediate_results_dir=tmp \
      --num_shards=2

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RAREDISEASE:RAREDISEASE:CALL_SNV:CALL_SNV_DEEPVARIANT:DEEPVARIANT":
      deepvariant: 1.6.1
  END_VERSIONS

Command exit status:
  0

Command output:
  I0926 04:32:57.053048 137776620853056 make_examples_core.py:301] Task 0/2: 4025 candidates (4354 examples) [14.00s elapsed]
  I0926 04:33:17.656884 133199120856896 make_examples_core.py:301] Task 1/2: 6000 candidates (6543 examples) [21.92s elapsed]
  I0926 04:33:18.616353 137776620853056 make_examples_core.py:301] Task 0/2: 6022 candidates (6563 examples) [21.56s elapsed]
  I0926 04:36:29.082540 133199120856896 make_examples_core.py:301] Task 1/2: Writing example info to tmp/make_examples.tfrecord-00001-of-00002.gz.example_info.json
  I0926 04:36:29.082702 133199120856896 make_examples_core.py:2958] example_shape = [100, 221, 7]
  I0926 04:36:29.082921 133199120856896 make_examples_core.py:2959] example_channels = [1, 2, 3, 4, 5, 6, 19]
  I0926 04:36:29.083358 133199120856896 make_examples_core.py:301] Task 1/2: Found 7230 candidate variants
  I0926 04:36:29.083420 133199120856896 make_examples_core.py:301] Task 1/2: Created 7809 examples
  I0926 04:36:45.969243 137776620853056 make_examples_core.py:301] Task 0/2: Writing example info to tmp/make_examples.tfrecord-00000-of-00002.gz.example_info.json
  I0926 04:36:45.969397 137776620853056 make_examples_core.py:2958] example_shape = [100, 221, 7]
  I0926 04:36:45.969614 137776620853056 make_examples_core.py:2959] example_channels = [1, 2, 3, 4, 5, 6, 19]
  I0926 04:36:45.969995 137776620853056 make_examples_core.py:301] Task 0/2: Found 7015 candidate variants
  I0926 04:36:45.970056 137776620853056 make_examples_core.py:301] Task 0/2: Created 7595 examples

  real  4m48.688s
  user  9m19.604s
  sys   0m6.300s

  ***** Running the command:*****
  time /opt/deepvariant/bin/call_variants --outfile "tmp/call_variants_output.tfrecord.gz" --examples "tmp/make_examples.tfrecord@2.gz" --checkpoint "/opt/models/wgs"

  /usr/local/lib/python3.8/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning: 

  TensorFlow Addons (TFA) has ended development and introduction of new features.
  TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
  Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). 

  For more information see: https://github.com/tensorflow/addons/issues/2807 

    warnings.warn(
  I0926 04:36:54.521170 137629337036608 call_variants.py:563] Total 1 writing processes started.
  I0926 04:36:54.523357 137629337036608 dv_utils.py:370] From tmp/make_examples.tfrecord-00000-of-00002.gz.example_info.json: Shape of input examples: [100, 221, 7], Channels of input examples: [1, 2, 3, 4, 5, 6, 19].
  I0926 04:36:54.523444 137629337036608 call_variants.py:588] Shape of input examples: [100, 221, 7]
  I0926 04:36:54.523688 137629337036608 call_variants.py:592] Use saved model: True
  I0926 04:37:02.490913 137629337036608 dv_utils.py:370] From /opt/models/wgs/example_info.json: Shape of input examples: [100, 221, 7], Channels of input examples: [1, 2, 3, 4, 5, 6, 19].
  I0926 04:37:02.491155 137629337036608 dv_utils.py:370] From tmp/make_examples.tfrecord-00000-of-00002.gz.example_info.json: Shape of input examples: [100, 221, 7], 

Log file:

nextflow.log

System information

Nextflow version: 24.04.4 build 5917 Hardware: Desktop Executor: local Container engine: Docker OS: Linux Mint 22 Version of nf-core/raredisease v2.2.0-gfa61a65

fa2k commented 1 month ago

Now I am getting the following:

-[nf-core/raredisease] Pipeline completed with errors- ERROR ~ Error executing process > 'NFCORE_RAREDISEASE:RAREDISEASE:ANN_CSQ_PLI_SV:ADD_MOST_SEVERE_CSQ (justhusky)'

... UnboundLocalError: local variable 'transcripts' referenced before assignment

I also get this error when I use the work-around of removing --regulatory.

jemten commented 4 weeks ago

Yeah, @ramprasadn has looked into it and the solution here is sadly to downgrade VEP until VEP 113 is released. When you don't have the --regulatory flag the vcf entry for BND lacks a CSQ field which causes issues. The ensembl tema is due to release the new version of VEP in October and we will update promptly. In the meantime I found that downgrading VEP works. I've added this to our config for all our vep processes.

    withName: '.*ANNOTATE_STRUCTURAL_VARIANTS:ENSEMBLVEP_SV' {
        container = 'https://depot.galaxyproject.org/singularity/ensembl-vep:110.0--pl5321h2a3209d_0'

Also had to set vep_cache and vep_cache_version in the params file to reflect the downgrade as they default to version 112

ramprasadn commented 2 weeks ago

Hi @Oliversinn,

We have updated VEP to version 113, but unfortunately, the issue has only been partially fixed. Please refer to the details in this GitHub issue: #640. For now, as @jemten suggested, we recommend continuing to use VEP 110. We are on top of this, as we are also eager to benefit from the improvements made after version 110.

We will update the pipeline as soon as fixes become available. I will close this issue for now, but feel free to reopen it if needed.