hartleys / QoRTs

Quality of RNA-Seq Toolset
52 stars 14 forks source link

"IMMPOSSIBLE STATE! FATAL ERROR!" with Hisat2 data - Bioconda installation #79

Open J-Calvelo opened 4 years ago

J-Calvelo commented 4 years ago

Hello, I was trying to get splice junction counts from a Hisat2 alignment with the following line: qorts QC --maxReadLength 101 --nameSorted --minMAPQ 50 test.bam wombase.gtf Count_Outputs

And got the following error:

Finished` reading SAM. Read: 11186385 reads/read-pairs.
Finished reading SAM. Used: 11039996 reads/read-pairs.
[Time: 2020-04-23 19:48:26] [Mem usage: [723MB / 1062MB]] [Elapsed Time: 00:22:50.0545]
> Read Stats:
>   READ_PAIR_OK                   11039996
>   TOTAL_READ_PAIRS               11186385
>   DROPPED_NOT_PROPER_PAIR        146389
>   DROPPED_READ_FAILS_VENDOR_QC   0
>   DROPPED_MARKED_NOT_VALID       0
>   DROPPED_CHROMS_MISMATCH        0
>   DROPPED_PAIR_STRANDS_MISMATCH  0
>   DROPPED_IGNORED_CHROMOSOME     0
>   DROPPED_NOT_UNIQUE_ALIGNMENT   0
>   DROPPED_NO_ALN_BLOCKS   0
>   DROPPED_NOT_MARKED_RG   -1
Pre-alignment read count unknown (Set --seqReadCt or --rawfastq)
Writing Output...
<====== FATAL ERROR! ======>
----------------------------
     Error message: "IMMPOSSIBLE STATE! FATAL ERROR! qcJunctionCounts.writeOutput, writing forSpliceSeq"
     Stack Trace:
        java.base/java.lang.Thread.getStackTrace(Thread.java:1606)
        internalUtils.Reporter$.error(Reporter.scala:294)
        qcUtils.qcJunctionCounts.$anonfun$writeOutput$3(qcJunctionCounts.scala:203)
        qcUtils.qcJunctionCounts.$anonfun$writeOutput$3$adapted(qcJunctionCounts.scala:194)
        scala.collection.Iterator.foreach(Iterator.scala:929)
        scala.collection.Iterator.foreach$(Iterator.scala:929)
        scala.collection.AbstractIterator.foreach(Iterator.scala:1417)
        scala.collection.IterableLike.foreach(IterableLike.scala:71)
        scala.collection.IterableLike.foreach$(IterableLike.scala:70)
        scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        qcUtils.qcJunctionCounts.writeOutput(qcJunctionCounts.scala:194)
        qcUtils.runAllQC$.$anonfun$runOnSeqFile$8(runAllQC.scala:1410)
        qcUtils.runAllQC$.$anonfun$runOnSeqFile$8$adapted(runAllQC.scala:1410)
        scala.collection.Iterator.foreach(Iterator.scala:929)
        scala.collection.Iterator.foreach$(Iterator.scala:929)
        scala.collection.AbstractIterator.foreach(Iterator.scala:1417)
        scala.collection.IterableLike.foreach(IterableLike.scala:71)
        scala.collection.IterableLike.foreach$(IterableLike.scala:70)
        scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        qcUtils.runAllQC$.runOnSeqFile(runAllQC.scala:1410)
        qcUtils.runAllQC$.run(runAllQC.scala:960)
        qcUtils.runAllQC$allQC_runner.run(runAllQC.scala:672)
        runner.runner$.main(runner.scala:97)
        runner.runner.main(runner.scala)
<==========================>
Exception in thread "main" java.lang.Exception: IMMPOSSIBLE STATE! FATAL ERROR! qcJunctionCounts.writeOutput, writing forSpliceSeq
    at internalUtils.Reporter$.error(Reporter.scala:299)
    at qcUtils.qcJunctionCounts.$anonfun$writeOutput$3(qcJunctionCounts.scala:203)
    at qcUtils.qcJunctionCounts.$anonfun$writeOutput$3$adapted(qcJunctionCounts.scala:194)
    at scala.collection.Iterator.foreach(Iterator.scala:929)
    at scala.collection.Iterator.foreach$(Iterator.scala:929)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1417)
    at scala.collection.IterableLike.foreach(IterableLike.scala:71)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:70)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at qcUtils.qcJunctionCounts.writeOutput(qcJunctionCounts.scala:194)
    at qcUtils.runAllQC$.$anonfun$runOnSeqFile$8(runAllQC.scala:1410)
    at qcUtils.runAllQC$.$anonfun$runOnSeqFile$8$adapted(runAllQC.scala:1410)
    at scala.collection.Iterator.foreach(Iterator.scala:929)
    at scala.collection.Iterator.foreach$(Iterator.scala:929)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1417)
    at scala.collection.IterableLike.foreach(IterableLike.scala:71)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:70)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at qcUtils.runAllQC$.runOnSeqFile(runAllQC.scala:1410)
    at qcUtils.runAllQC$.run(runAllQC.scala:960)
    at qcUtils.runAllQC$allQC_runner.run(runAllQC.scala:672)
    at runner.runner$.main(runner.scala:97)
    at runner.runner.main(runner.scala)

Not sure of what could have caused it. If its informative, I'm studying SL trans-splicing and for this alignments I removed the splicer leader sequences prior to the alignment so reads were hard trimmed.

Thanks

hartleys commented 4 years ago

Woah. So usually when I put an "impossible state" error it definitely means that there is something really wrong that I didnt think was possible.

On first glance my guess would be that maybe some sort of IO issue?

Is it consistant? If you run it again exactly the same do you get the same result?

Also useful would be the full log. That will include the version information and the jvm version info and stuff like that, which can be useful to narro wes things down...

On Thu, Apr 23, 2020, 7:24 PM J-Calvelo notifications@github.com wrote:

Hello, I was trying to get splice junction counts from a Hisat2 alignment with the following line: qorts QC --maxReadLength 101 --nameSorted --minMAPQ 50 test.bam wombase.gtf Count_Outputs

And got the following error:

Finished` reading SAM. Read: 11186385 reads/read-pairs. Finished reading SAM. Used: 11039996 reads/read-pairs. [Time: 2020-04-23 19:48:26] [Mem usage: [723MB / 1062MB]] [Elapsed Time: 00:22:50.0545]

Read Stats: READ_PAIR_OK 11039996 TOTAL_READ_PAIRS 11186385 DROPPED_NOT_PROPER_PAIR 146389 DROPPED_READ_FAILS_VENDOR_QC 0 DROPPED_MARKED_NOT_VALID 0 DROPPED_CHROMS_MISMATCH 0 DROPPED_PAIR_STRANDS_MISMATCH 0 DROPPED_IGNORED_CHROMOSOME 0 DROPPED_NOT_UNIQUE_ALIGNMENT 0 DROPPED_NO_ALN_BLOCKS 0 DROPPED_NOT_MARKED_RG -1 Pre-alignment read count unknown (Set --seqReadCt or --rawfastq) Writing Output... <====== FATAL ERROR! ======>

Error message: "IMMPOSSIBLE STATE! FATAL ERROR! qcJunctionCounts.writeOutput, writing forSpliceSeq" Stack Trace: java.base/java.lang.Thread.getStackTrace(Thread.java:1606) internalUtils.Reporter$.error(Reporter.scala:294) qcUtils.qcJunctionCounts.$anonfun$writeOutput$3(qcJunctionCounts.scala:203) qcUtils.qcJunctionCounts.$anonfun$writeOutput$3$adapted(qcJunctionCounts.scala:194) scala.collection.Iterator.foreach(Iterator.scala:929) scala.collection.Iterator.foreach$(Iterator.scala:929) scala.collection.AbstractIterator.foreach(Iterator.scala:1417) scala.collection.IterableLike.foreach(IterableLike.scala:71) scala.collection.IterableLike.foreach$(IterableLike.scala:70) scala.collection.AbstractIterable.foreach(Iterable.scala:54) qcUtils.qcJunctionCounts.writeOutput(qcJunctionCounts.scala:194) qcUtils.runAllQC$.$anonfun$runOnSeqFile$8(runAllQC.scala:1410) qcUtils.runAllQC$.$anonfun$runOnSeqFile$8$adapted(runAllQC.scala:1410) scala.collection.Iterator.foreach(Iterator.scala:929) scala.collection.Iterator.foreach$(Iterator.scala:929) scala.collection.AbstractIterator.foreach(Iterator.scala:1417) scala.collection.IterableLike.foreach(IterableLike.scala:71) scala.collection.IterableLike.foreach$(IterableLike.scala:70) scala.collection.AbstractIterable.foreach(Iterable.scala:54) qcUtils.runAllQC$.runOnSeqFile(runAllQC.scala:1410) qcUtils.runAllQC$.run(runAllQC.scala:960) qcUtils.runAllQC$allQC_runner.run(runAllQC.scala:672) runner.runner$.main(runner.scala:97) runner.runner.main(runner.scala) <==========================> Exception in thread "main" java.lang.Exception: IMMPOSSIBLE STATE! FATAL ERROR! qcJunctionCounts.writeOutput, writing forSpliceSeq at internalUtils.Reporter$.error(Reporter.scala:299) at qcUtils.qcJunctionCounts.$anonfun$writeOutput$3(qcJunctionCounts.scala:203) at qcUtils.qcJunctionCounts.$anonfun$writeOutput$3$adapted(qcJunctionCounts.scala:194) at scala.collection.Iterator.foreach(Iterator.scala:929) at scala.collection.Iterator.foreach$(Iterator.scala:929) at scala.collection.AbstractIterator.foreach(Iterator.scala:1417) at scala.collection.IterableLike.foreach(IterableLike.scala:71) at scala.collection.IterableLike.foreach$(IterableLike.scala:70) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at qcUtils.qcJunctionCounts.writeOutput(qcJunctionCounts.scala:194) at qcUtils.runAllQC$.$anonfun$runOnSeqFile$8(runAllQC.scala:1410) at qcUtils.runAllQC$.$anonfun$runOnSeqFile$8$adapted(runAllQC.scala:1410) at scala.collection.Iterator.foreach(Iterator.scala:929) at scala.collection.Iterator.foreach$(Iterator.scala:929) at scala.collection.AbstractIterator.foreach(Iterator.scala:1417) at scala.collection.IterableLike.foreach(IterableLike.scala:71) at scala.collection.IterableLike.foreach$(IterableLike.scala:70) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at qcUtils.runAllQC$.runOnSeqFile(runAllQC.scala:1410) at qcUtils.runAllQC$.run(runAllQC.scala:960) at qcUtils.runAllQC$allQC_runner.run(runAllQC.scala:672) at runner.runner$.main(runner.scala:97) at runner.runner.main(runner.scala)

Not sure of what could have caused it. If its informative, I'm studying SL trans-splicing and for this alignments I removed the splicer leader sequences prior to the alignment so reads were hard trimmed.

Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hartleys/QoRTs/issues/79, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWC53FB6OYWGSFMO3WLTNLRODE4HANCNFSM4MPUIT5Q .

J-Calvelo commented 4 years ago

Seems to be consistent and happens with other samples. Maybe there is something wrong with the conda version? thanks QC.5nZzr79gsrA4.log

hartleys commented 4 years ago

It looks like your flattened GFF has some sort of malformation.

This can happen if your input GTF has special characters in one or more of the gene IDs. In particular the colon character ":" will break geneIDs.

Could you send the GTF file? Or maybe just check the geneIDs?

J-Calvelo commented 4 years ago

Sorry for the delay, that seems to be the issue. I got the original GFF3 from wormbase and then processed it with gffread.


HMN_01_pilon    WormBase_imported   exon    66428   68427   .   -   .   transcript_id "transcript:HmN_000001200"; gene_id "gene:HmN_000001200"; gene_name "HmN_000001200"; Name "HmN_000001200"; info "method:InterPro accession:IPR004709 description:Na+/H+ exchanger %0Amethod:InterPro accession:IPR006153 description:Cation/H+ exchanger %0Amethod:InterPro accession:IPR018422 description:Cation/H+ exchanger%2C CPA1 family"; biotype "protein_coding";
HMN_01_pilon    WormBase_imported   exon    68655   68661   .   -   .   transcript_id "transcript:HmN_000001200"; gene_id "gene:HmN_000001200"; gene_name "HmN_000001200"; Name "HmN_000001200"; info "method:InterPro accession:IPR004709 description:Na+/H+ exchanger %0Amethod:InterPro accession:IPR006153 description:Cation/H+ exchanger %0Amethod:InterPro accession:IPR018422 description:Cation/H+ exchanger%2C CPA1 family"; biotype "protein_coding";
HMN_01_pilon    WormBase_imported   CDS 66428   68427   .   -   2   transcript_id "transcript:HmN_000001200"; gene_id "gene:HmN_000001200"; gene_name "HmN_000001200"; Name "HmN_000001200"; info "method:InterPro accession:IPR004709 description:Na+/H+ exchanger %0Amethod:InterPro accession:IPR006153 description:Cation/H+ exchanger %0Amethod:InterPro accession:IPR018422 description:Cation/H+ exchanger%2C CPA1 family"; biotype "protein_coding";
HMN_01_pilon    WormBase_imported   CDS 68655   68661   .   -   0   transcript_id "transcript:HmN_000001200"; gene_id "gene:HmN_000001200"; gene_name "HmN_000001200"; Name "HmN_000001200"; info "method:InterPro accession:IPR004709 description:Na+/H+ exchanger %0Amethod:InterPro accession:IPR006153 description:Cation/H+ exchanger %0Amethod:InterPro accession:IPR018422 description:Cation/H+ exchanger%2C CPA1 family"; biotype "protein_coding";
HMN_01_pilon    WormBase_imported   exon    71217   71579   .   -   .   transcript_id "transcript:HmN_000001100"; gene_id "gene:HmN_000001100"; gene_name "HmN_000001100"; Name "HmN_000001100"; info "method:InterPro accession:IPR008949 description:Isoprenoid synthase domain superfamily"; biotype "protein_coding";
HMN_01_pilon    WormBase_imported   exon    71618   71754   .   -   .   transcript_id "transcript:HmN_000001100"; gene_id "gene:HmN_000001100"; gene_name "HmN_000001100"; Name "HmN_000001100"; info "method:InterPro accession:IPR008949 description:Isoprenoid synthase domain superfamily"; biotype "protein_coding";
HMN_01_pilon    WormBase_imported   exon    71848   71950   .   -   .   transcript_id "transcript:HmN_000001100"; gene_id "gene:HmN_000001100"; gene_name "HmN_000001100"; Name "HmN_000001100"; info "method:InterPro accession:IPR008949 description:Isoprenoid synthase domain superfamily"; biotype "protein_coding";
HMN_01_pilon    WormBase_imported   exon    72024   72233   .   -   .   transcript_id "transcript:HmN_000001100"; gene_id "gene:HmN_000001100"; gene_name "HmN_000001100"; Name "HmN_000001100"; info "method:InterPro accession:IPR008949 description:Isoprenoid synthase domain superfamily"; biotype "protein_coding";
HMN_01_pilon    WormBase_imported   exon    72640   72739   .   -   .   transcript_id "transcript:HmN_000001100"; gene_id "gene:HmN_000001100"; gene_name "HmN_000001100"; Name "HmN_000001100"; info "method:InterPro accession:IPR008949 description:Isoprenoid synthase domain superfamily"; biotype "protein_coding";
HMN_01_pilon    WormBase_imported   exon    72878   72883   .   -   .   transcript_id "transcript:HmN_000001100"; gene_id "gene:HmN_000001100"; gene_name "HmN_000001100"; Name "HmN_000001100"; info "method:InterPro accession:IPR008949 description:Isoprenoid synthase domain superfamily"; biotype "protein_coding";

Thanks

hartleys commented 4 years ago

Ah. Ok. The problem is that colons are being used in the transcriptIDs and geneIDs. This problem is a carryover from the way DEXSeq generates UIDs for each entry. I thought I built in a check for this but apparently not.

There are a couple different options. Easiest option would be to do a text-replace for the colons in the GFF3:

zcat mygff.gff.gz | sed 's/[:]/-/g' > myFixedGff.gff.gz