larryns / MitoScape

A big-data, machine-learning workflow for aligning mtDNA from NGS data.
Apache License 2.0
8 stars 5 forks source link

SAMFormatException error #3

Closed aledavini7 closed 2 years ago

aledavini7 commented 2 years ago

Dear Larry, I am sorry to bother you again. I am trying to run Mitoscape on matched tumoral vs normal samples, but something strange is happening.

While running Mitoscape for tumoral samples gives me no problems, letting me obtain the final _MTDNA.bam output, for the normal samples Mitoscape stops and gives me errors regarding the SAM format. I copy below some lines describing the problem.

2022-08-01 11:47:26 WARN TaskSetManager:69 - Lost task 0.0 in stage 3.0 (TID 3, cn06.cluster.loc, executor driver): htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Read name A00302:75:HJVVMDMXX:2:2229:2076:28479, Read CIGAR M operator maps off end of reference       at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:455)       at htsjdk.samtools.BAMRecord.getCigar(BAMRecord.java:284)       at htsjdk.samtools.SAMRecord.isValid(SAMRecord.java:2092)       at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:848)       at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:834)       at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:802)       at org.seqdoop.hadoop_bam.BAMRecordReader.nextKeyValue(BAMRecordReader.java:228)       at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:247)       at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) ... ... 2022-08-01 11:47:26 ERROR TaskSetManager:73 - Task 0 in stage 3.0 failed 1 times; aborting job 2022-08-01 11:47:26 INFO TaskSchedulerImpl:57 - Removed TaskSet 3.0, whose tasks have all completed, from pool 2022-08-01 11:47:26 INFO TaskSchedulerImpl:57 - Cancelling stage 3 2022-08-01 11:47:26 INFO TaskSchedulerImpl:57 - Killing all running tasks in stage 3: Stage cancelled 2022-08-01 11:47:26 INFO DAGScheduler:57 - ShuffleMapStage 3 (isEmpty at MTClassifierModel.scala:77) failed in 25.874 s due to Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.

I am new to this kind of analysis, so I am really sorry if it is not an appropriate issue to open here. I don't know if it is a problem related to my starting data, but I have done the same steps for the tumoral samples and I obtained a successful output.

Thank you very much for your patience. Alessandro

larryns commented 2 years ago

Hi Alessandro,

The error message: htsjdk.samtools.SAMFormatException: SAM validation error: ERROR tells us that this is an error in format of the BAM file. Usually this problem happens with older aligners. I'd suggest using the latest version of gsnap for your alignments because it's the only aligner to my knowledge that handles the circular chrM easily. You can also use bwa if you don't care about the ends of chrM. Alternatively, you can try CleanSam in Picard tools.

This error isn't a MitoScape issue though.

Best of luck, Larry.

aledavini7 commented 2 years ago

Dear Larry, This is really embarassing. I was actually using an older version of GSNAP, which gave problems in the format of the bam files. I have updated the version and now everything works fine.

Thank you so much for your help! Alessandro

larryns commented 2 years ago

Hi Alessandro,

That's great, glad to hear it!

Larry.