gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
365 stars 76 forks source link

GffNameList Error: invalid index #297

Open sruddle opened 3 years ago

sruddle commented 3 years ago

Hi, I'm running stringtie as part of the nf-core rna-seq pipeline with bacterial transcripts and this reference. Many samples finish successfully, but several throw errors at the stringtieFPKM step.

These samples are part of a metatranscriptomic study and so, for many, there are numerous unassigned reads using the linked reference. However, I haven't noticed any clear associations between sample characteristics and errors.

Any idea what's happening here? I've noticed similar errors posted elsewhere, but no clear fix.

Thanks!

Error executing process > 'stringtieFPKM (B11_CKDL200161468-1a_HK3KHDSXY_L3_1)'

Caused by:
  Process `stringtieFPKM (B11_CKDL200161468-1a_HK3KHDSXY_L3_1)` terminated with an error exit status (1)

Command executed:

  stringtie B11_CKDL200161468-1a_HK3KHDSXY_L3_1.sorted.bam \
       \
      -o B11_CKDL200161468-1a_HK3KHDSXY_L3_1.sorted_transcripts.gtf \
      -v \
      -G GCF_000210855.2_ASM21085v2_genomic.gtf \
      -A B11_CKDL200161468-1a_HK3KHDSXY_L3_1.sorted.gene_abund.txt \
      -C B11_CKDL200161468-1a_HK3KHDSXY_L3_1.sorted.bam.cov_refs.gtf \
      -b B11_CKDL200161468-1a_HK3KHDSXY_L3_1.sorted_ballgown \
      -e

Command exit status:
  1

Command output:
  (empty)

Command error:
  Running StringTie 2.0. Command line:
  stringtie B11_CKDL200161468-1a_HK3KHDSXY_L3_1.sorted.bam -o B11_CKDL200161468-1a_HK3KHDSXY_L3_1.sorted_transcripts.gtf -v -G GCF_000210855.2_ASM21085v2_genomic.gtf -A B11_CKDL200161468-1a_HK3KHDSXY_L3_1.sorted.gene_abund.txt -C B11_CKDL200161468-1a_HK3KHDSXY_L3_1.sorted.bam.cov_refs.gtf -b B11_CKDL200161468-1a_HK3KHDSXY_L3_1.sorted_ballgown -e
  [10/02 14:09:48] Loading reference annotation (guides)..
  Warning: merging adjacent/overlapping segments (distance=1) of gene-SL1344_RS15730 on NC_016810.1 (3221466-3222488, 3222490-3222564)
  [10/02 14:09:48] 4944 reference transcripts loaded.
  Default stack size for threads: 2097152 (increased to 8388608)
  [10/02 14:10:33]>bundle NC_016810.1:29-4826067 [15077834 alignments (2086210 distinct), 947 junctions, 4667 guides] begins processing...
  GffNameList Error: invalid index (3)

Work dir:
  ./work/28/3ad627cc3075370e2ce46d6325830a

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
gpertea commented 3 years ago

That looks like a bug in an old version of Stringtie 2 (the version shown as running there is the very first v2.0 release which had a few issues fixed in subsequent releases). Are you using the pipeline with the conda configuration? That seems to be the one pulling the old 2.0 version of StringTie even though v2.1.2 is available with conda and should have that bug fixed. Perhaps you can reinstall the pipeline after changing rnaseq/environment.yml file to point to the latest conda version of stringtie (the output format should not change), instead of stringtie=2.0 as it is now ? (https://github.com/nf-core/rnaseq/blob/3b6df9bd104927298fcdf69e97cca7ff1f80527c/environment.yml#L34)

mattias-erhardsson commented 3 years ago

I also have this issue, but running the nf-core rna-seq pipeline v3 with Stringtie 2.1.4. It's also a bacteria transcript but with Helicobacter pylori p12 reference genome. I had to do one manual fix of the genome as someone has made a strange custom annotation of the gene HPP12_0218 which causes other problems if left alone. I fixed this gene by just merging the two parts it's split into. I'm running it on UPPMAX.

I actually encountered this error too with version 2 of the pipeline and this exact post is the reason why I tried switching to version 3.

I'm a newbie and don't know what to do about this.

Error executing process > 'RNASEQ:STRINGTIE (Mutant_LEB_HSA_R1)'

Caused by:
  Process `RNASEQ:STRINGTIE (Mutant_LEB_HSA_R1)` terminated with an error exit status (1)

Command executed:

  stringtie \
      Mutant_LEB_HSA_R1.markdup.sorted.bam \
      --rf \
      -G GCA_000021465.1_ASM2146v1_genomic_gffread_merged_HPP12_0218.gtf \
      -o Mutant_LEB_HSA_R1.transcripts.gtf \
      -A Mutant_LEB_HSA_R1.gene_abundance.txt \
      -C Mutant_LEB_HSA_R1.coverage.gtf \
      -b Mutant_LEB_HSA_R1.ballgown \
      -v -e

  stringtie --version > stringtie.version.txt

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: Skipping mount /var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
  Running StringTie 2.1.4. Command line:
  stringtie Mutant_LEB_HSA_R1.markdup.sorted.bam --rf -G GCA_000021465.1_ASM2146v1_genomic_gffread_merged_HPP12_0218.gtf -o Mutant_LEB_HSA_R1.transcripts.gtf -A Mutant_LEB_HSA_R1.gene_abundance.txt -C Mutant_LEB_HSA_R1.coverage.gtf -b Mutant_LEB_HSA_R1.ballgown -v -e
  [01/22 17:46:48] Loading reference annotation (guides)..
  [01/22 17:46:48] 1621 reference transcripts loaded.
  Default stack size for threads: 2097152 (increased to 8388608)
  Warning: invalid mapping found for read A00687:38:HG3LCDRXX:1:1227:8088:18474 (position=1140036, mapped length=9)
  Warning: invalid mapping found for read A00687:38:HG3LCDRXX:1:1227:8088:18474 (position=1147650, mapped length=9)
  Warning: invalid mapping found for read A00687:38:HG3LCDRXX:2:1145:5150:26475 (position=1219065, mapped length=9)
  Warning: invalid mapping found for read A00687:38:HG3LCDRXX:1:1227:8088:18474 (position=1286111, mapped length=9)
  Warning: invalid mapping found for read A00687:38:HG3LCDRXX:1:1227:8088:18474 (position=1287579, mapped length=9)
  Warning: invalid mapping found for read A00687:38:HG3LCDRXX:1:2112:1280:4085 (position=1308834, mapped length=8)
  Warning: invalid mapping found for read A00687:38:HG3LCDRXX:2:1145:5150:26475 (position=1336600, mapped length=9)
  Warning: invalid mapping found for read A00687:38:HG3LCDRXX:1:2112:1280:4085 (position=1338550, mapped length=8)
  Warning: invalid mapping found for read A00687:38:HG3LCDRXX:1:2112:1280:4085 (position=1358121, mapped length=7)
  Warning: invalid mapping found for read A00687:38:HG3LCDRXX:1:1227:8088:18474 (position=1528046, mapped length=9)
  [01/22 17:47:15]>bundle CP001217.1:13-1673513 [7117277 alignments (77556 distinct), 16660 junctions, 1610 guides] begins processing...
  GffNameList Error: invalid index (3)

Work dir:
  /crex/proj/snic2020-16-165/HpRNAseq/Mattias/NCBI_genome_nf_core_rnaseq_pipeline/work/42/76d3cabe6cbc52997a984b364e2685

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`