hartleys / QoRTs

Quality of RNA-Seq Toolset
52 stars 14 forks source link

Mapped mate should have mate reference name #29

Closed alexpenson closed 7 years ago

alexpenson commented 7 years ago

Running on a STAR alignment from TCGA (TCGA-2J-AAB6-01A-11R-A41B-07) I get the following error:

============================FATAL_ERROR============================
Error info:
Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 109368398, Read name UNC11-SN627:380:C58DMACXX:5:1101:4427:2139, Mapped mate should have mate reference name
        at net.sf.samtools.SAMUtils.processValidationErrors(SAMUtils.java:452)
        at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:633)
        at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:618)
        at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:588)
        at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:774)
        at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:752)
        at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
        at scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:930)
        at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:945)
        at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:985)
        at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:988)
        at internalUtils.stdUtils$$anon$1.hasNext(stdUtils.scala:282)
        at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:193)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:406)
        at internalUtils.commonSeqUtils$$anon$2.next(commonSeqUtils.scala:708)
        at internalUtils.commonSeqUtils$$anon$2.next(commonSeqUtils.scala:699)
        at internalUtils.stdUtils$IteratorProgressReporter$$anon$4.next(stdUtils.scala:395)
        at scala.collection.Iterator$class.foreach(Iterator.scala:743)
        at internalUtils.stdUtils$IteratorProgressReporter$$anon$4.foreach(stdUtils.scala:389)
        at qcUtils.runAllQC$.runOnSeqFile(runAllQC.scala:1055)
        at qcUtils.runAllQC$.run(runAllQC.scala:774)
        at qcUtils.runAllQC$allQC_runner.run(runAllQC.scala:502)
        at runner.runner$.main(runner.scala:92)
        at runner.runner.main(runner.scala)

The offending read looks like this:

UNC11-SN627:380:C58DMACXX:5:1101:4427:2139      153     chr17   46938790        3       3S45M   *       0       0       GCATCCCACAGCCTGCAAGTGTGTGTGTGTGTGAAAGAGAGAGGGGGG        =C==HCB>HFGGDIJJIIJJJJIIIGJGHHEIGJHDHGHGFDFFFCCC        NH:i:2  HI:i:2  NM:i:0MD:Z:45  AS:i:44 RG:Z::140908_UNC11-SN627_0380_AC58DMACXX_GGCTAC_L005
UNC11-SN627:380:C58DMACXX:5:1101:4427:2139      345     chr17   46938801        3       47M1S   *       0       0       GCAAGTGTGTGGGTGTGTGAAAGAGAGAGGGGGGCCCAGAGGCCGCCA        ############@;C>=84EB<<?>@F:IIIHGIIHHHHHDDDDD@@@        NH:i:2  HI:i:1  NM:i:1MD:Z:11T35       AS:i:44 RG:Z::140908_UNC11-SN627_0380_AC58DMACXX_GGCTAC_L005
UNC11-SN627:380:C58DMACXX:5:1101:4427:2139      101     *       0       0       *       *       0       0       TGGCGGCCTCTGGGCCCCCCTCTCTCTTTCACACACCCACACACTTGC        @@@DDDDDHHHHHIIGHIII:F@>?<<BE48=>C;@############        NH:i:0  HI:i:0  AS:i:44 nM:i:0uT:A:4   RG:Z::140908_UNC11-SN627_0380_AC58DMACXX_GGCTAC_L005

Let me know if you have any suggestions Thanks Alex

hartleys commented 7 years ago

Hmm. This error is thrown by the SAMtools java utilities. Basically, your BAM file is malformed and does not adhere to the current SAM specification (v1.4).

My bet would be either (1) it was created a long time ago, and adheres to an older (looser) specification, or (2) it's been run through a conversion utility that is designed to conform to an older (looser) specification.

It looks like it has to do with half-mapped read-pairs (read-pairs where one read is mapped, and the other is not). You might want to just filter out these reads using samtools.

samtools view -b -F 4 -F 8 myBamFile.bam > filteredBam.bam

Or you can pipe this directly into QoRTs using linux pipes:

samtools view -b -F 4 -F 8 myBamFile.bam | java -jar QoRTs.jar QC [...qorts options...] - gtffile.gtf outputDir

Try this out and let me know. Depending on your version of samtools this might not work either, since samtools might also crash when it finds the bad read.

alexpenson commented 7 years ago

It works for me. Thanks a lot!