nf-core / rnafusion

RNA-seq analysis pipeline for detection of gene-fusions
https://nf-co.re/rnafusion
MIT License
138 stars 93 forks source link

Error in NFCORE_RNAFUSION:RNAFUSION:FUSIONINSPECTOR_WORKFLOW:VCF_COLLECT #460

Closed vwucher closed 6 months ago

vwucher commented 8 months ago

Description of the bug

Hi, I am currently trying to run rnafusion 3.0.1 on patient samples, but it crashes during the NFCORE_RNAFUSION:RNAFUSION:FUSIONINSPECTOR_WORKFLOW:VCF_COLLECT process. I can make it work if I use the --skip_vis option, but I would like to have at least the visualisation of Arriba.

Since it is for diagnosis, it is targeted RNA-seq and it is also possible that for some samples there is no fusion detected, I don't know if it can be the issue.

I don't know if it possible to either fix this or skip this process only (and not the Arriba visualisation).

Thanks in advance for your help!

Command used and terminal output

bash ./nextflow-23.10.0-all run nf-core/rnafusion \
    -r 3.0.1 \
    -latest \
    -c HCL.conf \
    -profile singularity \
    -resume \
    --input input/$ticket/samplesheet.csv \
    --outdir output/$ticket \
    --genomes_base /srv/scratch/rnafusion_3/build/references \
    --all

[...]

-[nf-core/rnafusion] Pipeline completed with errors-
-[nf-core/rnafusion] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_RNAFUSION:RNAFUSION:FUSIONINSPECTOR_WORKFLOW:VCF_COLLECT (C469T-23EH16313-0-RUBAN-LIG1H)'

Caused by:
  Process `NFCORE_RNAFUSION:RNAFUSION:FUSIONINSPECTOR_WORKFLOW:VCF_COLLECT (C469T-23EH16313-0-RUBAN-LIG1H)` terminated with an error exit status (1)

Command executed:

  vcf_collect.py --fusioninspector C469T-23EH16313-0-RUBAN-LIG1H.FusionInspector.fusions.abridged.tsv.annotated.coding_effect --fusionreport C469T-23EH16313-0-RUBAN-LIG1H_fusionreport_index.html --fusioninspector_gtf C469T-23EH16313-0-RUBAN-LIG1H.tsv --fusionreport_csv C469T-23EH16313-0-RUBAN-LIG1H.fusions.csv --hgnc hgnc_complete_set.txt --sample C469T-23EH16313-0-RUBAN-LIG1H --out C469T-23EH16313-0-RUBAN-LIG1H_fusion_data.vcf
  gzip C469T-23EH16313-0-RUBAN-LIG1H_fusion_data.vcf

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RNAFUSION:RNAFUSION:FUSIONINSPECTOR_WORKFLOW:VCF_COLLECT":
      python: $(python --version | sed 's/Python //g')
      HGNC DB retrieval: $(cat HGNC-DB-timestamp.txt)
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "/home/chu-lyon.fr/wucherex/.nextflow/assets/nf-core/rnafusion/bin/vcf_collect.py", line 505, in <module>
      sys.exit(main())
               ^^^^^^
    File "/home/chu-lyon.fr/wucherex/.nextflow/assets/nf-core/rnafusion/bin/vcf_collect.py", line 493, in main
      vcf_collect(
    File "/home/chu-lyon.fr/wucherex/.nextflow/assets/nf-core/rnafusion/bin/vcf_collect.py", line 41, in vcf_collect
      build_fusioninspector_dataframe(fusioninspector_in_file)
    File "/home/chu-lyon.fr/wucherex/.nextflow/assets/nf-core/rnafusion/bin/vcf_collect.py", line 275, in build_fusioninspector_dataframe
      df[["ChromosomeA", "PosA", "Strand1"]] = df["LeftBreakpoint"].str.split(":", expand=True)
      ~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 3966, in __setitem__
      self._setitem_array(key, value)
    File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 4008, in _setitem_array
      check_key_length(self.columns, key, value)
    File "/usr/local/lib/python3.11/site-packages/pandas/core/indexers/utils.py", line 401, in check_key_length
      raise ValueError("Columns must be same length as key")
  ValueError: Columns must be same length as key

Work dir:
  /srv/scratch/rnafusion_3/work/8c/6fd61a80d27d92b5bb9e288efaf8ae

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details
  File "/home/chu-lyon.fr/wucherex/.nextflow/assets/nf-core/rnafusion/bin/vcf_collect.py", line 505, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/chu-lyon.fr/wucherex/.nextflow/assets/nf-core/rnafusion/bin/vcf_collect.py", line 493, in main
    vcf_collect(
  File "/home/chu-lyon.fr/wucherex/.nextflow/assets/nf-core/rnafusion/bin/vcf_collect.py", line 41, in vcf_collect
    build_fusioninspector_dataframe(fusioninspector_in_file)
  File "/home/chu-lyon.fr/wucherex/.nextflow/assets/nf-core/rnafusion/bin/vcf_collect.py", line 275, in build_fusioninspector_dataframe
    df[["ChromosomeA", "PosA", "Strand1"]] = df["LeftBreakpoint"].str.split(":", expand=True)
    ~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 3966, in __setitem__
    self._setitem_array(key, value)
  File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 4008, in _setitem_array
    check_key_length(self.columns, key, value)
  File "/usr/local/lib/python3.11/site-packages/pandas/core/indexers/utils.py", line 401, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key`, size: 1259 (max: 255)

Relevant files

No response

System information

No response

rannick commented 7 months ago

Ok, so I cannot reproduce the error, would it be ok to send me your files, or at least a summary of the content of each of them? I am talking about:

vwucher commented 7 months ago

Hi,

Thanks a lot for your answer. I am not sure I can share the data, since it is patient data but I will check. Otherwise, I manage to make it run by ignoring the process:

process {
  // Ignore the error
  withName: VCF_COLLECT {
    errorStrategy = 'ignore'
  }
}

By doing so, rnafusion can finish:

executor >  local (51)
[f8/be7f05] process > NFCORE_RNAFUSION:RNAFUSION:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet.csv)                                 [100%] 1 of 1, cached: 1 ✔
[-        ] process > NFCORE_RNAFUSION:RNAFUSION:CAT_FASTQ                                                                       -
[48/be59c4] process > NFCORE_RNAFUSION:RNAFUSION:FASTQC (E14658TPOOL-POOL-CTRL-HD784-54363-LIG1H)                                [100%] 23 of 23, cached: 23 ✔
[e5/5a4d78] process > NFCORE_RNAFUSION:RNAFUSION:ARRIBA_WORKFLOW:STAR_FOR_ARRIBA (E15254T-23EH16378-0-RB-LIG1H)                  [100%] 23 of 23, cached: 23 ✔
[e9/4b325d] process > NFCORE_RNAFUSION:RNAFUSION:ARRIBA_WORKFLOW:ARRIBA (E14658TPOOL-POOL-CTRL-HD784-54363)                      [100%] 23 of 23, cached: 23 ✔
[49/2eb42c] process > NFCORE_RNAFUSION:RNAFUSION:STARFUSION_WORKFLOW:STAR_FOR_STARFUSION (C467T-23EH16496-3)                     [100%] 23 of 23, cached: 23 ✔
[ea/3a9dbf] process > NFCORE_RNAFUSION:RNAFUSION:STARFUSION_WORKFLOW:SAMTOOLS_INDEX_FOR_STARFUSION (E15258T-23EH15799-01G-LIG1H) [100%] 23 of 23, cached: 23 ✔
[d9/57b2fe] process > NFCORE_RNAFUSION:RNAFUSION:STARFUSION_WORKFLOW:STARFUSION (E15258T-23EH15799-01G-LIG1H)                    [100%] 23 of 23, cached: 23 ✔
[10/99dee4] process > NFCORE_RNAFUSION:RNAFUSION:FUSIONCATCHER_WORKFLOW:FUSIONCATCHER (C469T-23EH16313-0-RUBAN)                  [100%] 23 of 23, cached: 23 ✔
[90/1b5128] process > NFCORE_RNAFUSION:RNAFUSION:STRINGTIE_WORKFLOW:STRINGTIE_STRINGTIE (E15258T-23EH15799-01G-LIG1H)            [100%] 23 of 23, cached: 23 ✔
[d4/7f9602] process > NFCORE_RNAFUSION:RNAFUSION:STRINGTIE_WORKFLOW:STRINGTIE_MERGE (23)                                         [100%] 23 of 23, cached: 23 ✔
[39/6e81fd] process > NFCORE_RNAFUSION:RNAFUSION:FUSIONREPORT_WORKFLOW:FUSIONREPORT (E14994T-23EH11369-08H)                      [100%] 23 of 23, cached: 23 ✔
[d4/2d508d] process > NFCORE_RNAFUSION:RNAFUSION:FUSIONINSPECTOR_WORKFLOW:FUSIONINSPECTOR (E15258T-23EH15799-01G)                [100%] 16 of 16, cached: 11 ✔
[44/25653d] process > NFCORE_RNAFUSION:RNAFUSION:FUSIONINSPECTOR_WORKFLOW:AGAT_CONVERTSPGFF2TSV (E15258T-23EH15799-01G)          [100%] 16 of 16, cached: 2 ✔
[8d/c63819] process > NFCORE_RNAFUSION:RNAFUSION:FUSIONINSPECTOR_WORKFLOW:VCF_COLLECT (E15258T-23EH15799-01G)                    [100%] 16 of 16, failed: 7 ✔
[ac/f84e64] process > NFCORE_RNAFUSION:RNAFUSION:FUSIONINSPECTOR_WORKFLOW:ARRIBA_VISUALISATION (E15258T-23EH15799-01G)           [100%] 16 of 16, cached: 2 ✔
[3c/161c72] process > NFCORE_RNAFUSION:RNAFUSION:QC_WORKFLOW:PICARD_COLLECTRNASEQMETRICS (E15258T-23EH15799-01G-LIG1H)           [100%] 23 of 23, cached: 23 ✔
[46/f281d8] process > NFCORE_RNAFUSION:RNAFUSION:QC_WORKFLOW:GATK4_MARKDUPLICATES (C468T-23EH16475-1)                            [100%] 23 of 23, cached: 23 ✔
[67/95b0e2] process > NFCORE_RNAFUSION:RNAFUSION:QC_WORKFLOW:PICARD_COLLECTINSERTSIZEMETRICS (C468T-23EH16475-1)                 [100%] 23 of 23, cached: 23 ✔
[57/e1d4a4] process > NFCORE_RNAFUSION:RNAFUSION:CUSTOM_DUMPSOFTWAREVERSIONS (1)                                                 [100%] 1 of 1 ✔
[dd/7576ed] process > NFCORE_RNAFUSION:RNAFUSION:MULTIQC                                                                         [100%] 1 of 1 ✔
-[nf-core/rnafusion] Pipeline completed successfully, but with errored process(es) -
-[nf-core/rnafusion] Pipeline completed successfully, but with errored process(es) -
Completed at: 31-Jan-2024 11:03:40
Duration    : 19m 46s
CPU hours   : 89.1 (84.1% cached, 0% failed)
Succeeded   : 44
Cached      : 315
Ignored     : 7
Failed      : 7
vwucher commented 7 months ago

Hi again,

Normally it is ok to share these processed RNA-seq data. I put the four files changing their name (sample1.zip).

Thanks again for your help, Valentin