nf-core / rnaseq

RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.
https://nf-co.re/rnaseq
MIT License
885 stars 702 forks source link

Error running process `NFCORE_RNASEQ:RNASEQ:BAM_RSEQC:RSEQC_JUNCTIONANNOTATION` #1315

Open charlesdavid opened 4 months ago

charlesdavid commented 4 months ago

Description of the bug

When choosing to include the RSEQC analysis, an error is generated for most of the samples. Below is an example of the error generated in the work directory for a failed sample: "SAMPLE.junction_annotation.log

[E::idx_find_and_load] Could not retrieve index file for 'SAMPLE_X.sorted.bam'
Reading reference bed file:  assembly.filtered.bed  ...  Done
Load BAM file ...  Done

===================================================================
Total splicing  Events: 7365670
Known Splicing Events:  6859752
Partial Novel Splicing Events:  59762
Novel Splicing Events:  443604
Filtered Splicing Events:   2552

Traceback (most recent call last):
  File "/usr/local/bin/junction_annotation.py", line 171, in <module>
    main()
  File "/usr/local/bin/junction_annotation.py", line 149, in main
    obj.annotate_junction(outfile=options.output_prefix,refgene=options.ref_gene_model,min_intron=options.min_intron, q_cut = options.map_qual)
  File "/usr/local/lib/python3.9/site-packages/qcmodule/SAM.py", line 3832, in annotate_junction
    (chrom, i_st, i_end) = i.split(":")
ValueError: too many values to unpack (expected 3)

And here is the same log file from a successful sample:

[E::idx_find_and_load] Could not retrieve index file for 'SAMPLE_Y.sorted.bam'
Reading reference bed file:  assembly.filtered.bed  ...  Done
Load BAM file ...  Done

===================================================================
Total splicing  Events: 6648248
Known Splicing Events:  6200384
Partial Novel Splicing Events:  54665
Novel Splicing Events:  388928
Filtered Splicing Events:   4271

Total splicing  Junctions:  119874
Known Splicing Junctions:   86067
Partial Novel Splicing Junctions:   11631
Novel Splicing Junctions:   22176

===================================================================
Create BED file ...
Create Interact file ...

The .command.log file for the failed shell script only contains the following:"total = 7365670", while for

the successful script:

null device 
          1 
null device 
          1 
total = 6648248

The error is puzzling as the process has completed for some of the input samples.

Moreover, the failure is occurring with files generated by the pipeline itself.

The only way I have overcome this is choosing to not run the problematic module, but this is not ideal.

Hoping someone may have insight to why this is occurring and how to resolve it :-)

Command used and terminal output

No response

Relevant files

No response

System information

N E X T F L O W ~ version 23.04.4 nf-core/rnaseq v3.14.0-gb89fac3 HPC Linux RHE11 Apptainer SLURM

pinin4fjords commented 2 days ago

Could you provide a reproducible example for this please, including some inputs (or just using the test profile if you can reproduce it there) and the exact command line and parameters used?