hartleys / QoRTs

Quality of RNA-Seq Toolset
52 stars 14 forks source link

Fatal error #21

Closed bioinfo-dirty-jobs closed 8 years ago

bioinfo-dirty-jobs commented 8 years ago

Dear Doctor I have this fatal error for some files . I have aligned using STAR

[sbsuser@compute-00-00 Analisi_agosto2016]$ java -jar ~/software/QoRTs.jar QC --stranded --maxReadLength 50 /illumina/runs/FASTQ/Analisi_agosto2016/ALIGN/390/Aligned.sortedByCoord.out.bam /illumina/software/database/hg38/Homo_sapiens.GRCh38.80.gtf statistics/390 Starting QoRTs v1.1.8 (Compiled Wed Jul 13 13:35:56 EDT 2016) Starting time: (Tue Sep 13 11:29:51 CEST 2016) INPUT_COMMAND(QC) INPUT_ARG(infile)=/illumina/runs/FASTQ/Analisi_agosto2016/ALIGN/390/Aligned.sortedByCoord.out.bam INPUT_ARG(gtffile)=/illumina/software/database/hg38/Homo_sapiens.GRCh38.80.gtf INPUT_ARG(outdir)=statistics/390 INPUT_ARG(stranded)=true INPUT_ARG(maxReadLength)=Some(50) Creating Directory: statistics/390 Created Log File: statistics/390/QC.HyJiU85CIYJT.log Starting QC [Time: 2016-09-13 11:29:51] [Mem usage: [19MB / 758MB]] [Elapsed Time: 00:00:00.0000] QoRTs is Running in paired-end mode. QoRTs is Running in any-sorted mode. Running functions: NVC, GCDistribution, GeneCalcs, QualityScoreDistribution, writeJunctionSeqCounts, writeKnownSplices, writeNovelSplices, writeClippedNVC, CigarOpDistribution, InsertSize, chromCounts, writeSpliceExon, writeGenewiseGeneBody, JunctionCalcs, writeGeneCounts, writeBiotypeCounts, writeDESeq, writeDEXSeq, writeGeneBody, StrandCheck Pre-alignment read count unknown (Set --seqReadCt or --rawfastq) Checking first 10000 reads. Checking SAM file for formatting errors... NOTE: Read length is not consistent. In the first 10000 reads, read length varies from 32 to 51 (param maxReadLength=50) Note that using data that is hard-clipped prior to alignment is NOT recommended, because this makes it difficult (or impossible) to determine the sequencer read-cycle of each nucleotide base. This may obfuscate cycle-specific artifacts, trends, or errors, the detection of which is one of the primary purposes of QoRTs!In addition, hard clipping (whether before or after alignment) removes quality score data, and thus quality score metrics may be misleadingly optimistic. A MUCH preferable method of removing undesired sequence is to replace such sequence with N's, which preserves the quality score and the sequencer cycle information. Sorting Note: Reads are not sorted by name (This is OK). Sorting Note: Reads are sorted by position (This is OK). Done checking first 10000 reads. No major problems detected. SAMRecord Reader Generated. Read length: 50. [Time: 2016-09-13 11:29:53] [Mem usage: [74MB / 1354MB]] [Elapsed Time: 00:00:02.0006] Compiling flat feature annotation, internally in memory... Internal flat feature annotation compiled! QC Utilities Generated! [Time: 2016-09-13 11:37:11] [Mem usage: [2778MB / 5GB]] [Elapsed Time: 00:07:19.0889] Fatal error thrown for read: B0P8DQ1:76:HY735BCXX:2:1203:12812:93845 ============================FATAL_ERROR============================ QoRTs encountered a FATAL ERROR. For general help, use command: java -jar path/to/jar/QoRTs.jar --man ============================FATAL_ERROR============================ Error info: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 50 at qcUtils.qcCigarDistribution$$anonfun$readCigar$2.apply(qcCigarDistribution.scala:90) at qcUtils.qcCigarDistribution$$anonfun$readCigar$2.apply(qcCigarDistribution.scala:78) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778) at scala.collection.immutable.List.foreach(List.scala:383) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777) at qcUtils.qcCigarDistribution$.readCigar(qcCigarDistribution.scala:78) at qcUtils.qcCigarDistribution.runOnReadPair(qcCigarDistribution.scala:122) at qcUtils.qcCigarDistribution.runOnReadPair(qcCigarDistribution.scala:104) at qcUtils.runAllQC$$anonfun$runOnSeqFile$2.apply(runAllQC.scala:1070) at qcUtils.runAllQC$$anonfun$runOnSeqFile$2.apply(runAllQC.scala:1055) at scala.collection.Iterator$class.foreach(Iterator.scala:743) at internalUtils.stdUtils$IteratorProgressReporter$$anon$4.foreach(stdUtils.scala:389) at qcUtils.runAllQC$.runOnSeqFile(runAllQC.scala:1055) at qcUtils.runAllQC$.run(runAllQC.scala:774) at qcUtils.runAllQC$allQC_runner.run(runAllQC.scala:502)

Any help!! thanks !!

hartleys commented 8 years ago

You are setting the option "--maxReadLength 50", but as you can see in the log file QoRTs finds reads that are 51 reads long.

 "In the first 10000 reads, read length varies from 32 to 51 (param maxReadLength=50)"

Set maxReadLength to 51 or higher. You can always go higher for safety.

bioinfo-dirty-jobs commented 8 years ago

Thanks for your help. I use trimmomatic before align. It is a problem? In fact I see some reads with 32 bp. and it is the minimum of Trimomatic

hartleys commented 8 years ago

In general I recommend NOT using trimmomatic prior to quality control with QoRTs, as trimmomatic hard-clips bases from both ends. As a consequence, it is impossible to tell, just based on the reads, the sequencer cycle for each base. This means that any sequencer-cycle-specific artifacts or errors may be concealed in the QoRTs QC plots.

I recommend using untrimmed or soft-clipped reads for quality control purposes at least. Depending on the purpose of your study you may want to trim reads afterwards. For most expression studies this is no longer necessary due to improvements in the current generation of alignment tools (particularly STAR and GSNAP).