Closed danvk closed 9 years ago
Looks like a read at that position has an N
-base, AGGTGTGCAGAGCCAAAGGAGCAGATGAGCGAGGAGGTGAGNAGGAGCAN
so we say the reference is possibly N
or whatever the other reads say are the possible reference bases.
Full stack trace:
User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 31 in stage 4.0 failed 4 times, most recent failure: Lost task 31.3 in stage 4.0 (TID 517, demeter-csmau08-13.demeter.hpc.mssm.edu): java.lang.IllegalArgumentException: Multiple reference bases found in sample = SM at (chr, pos) = (17, 4455004)
at org.hammerlab.guacamole.commands.GermlineThreshold$Caller$$anonfun$callVariantsAtLocus$1.apply(GermlineThresholdCaller.scala:165)
at org.hammerlab.guacamole.commands.GermlineThreshold$Caller$$anonfun$callVariantsAtLocus$1.apply(GermlineThresholdCaller.scala:98)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at org.hammerlab.guacamole.commands.GermlineThreshold$Caller$.callVariantsAtLocus(GermlineThresholdCaller.scala:98)
at org.hammerlab.guacamole.commands.GermlineThreshold$Caller$$anonfun$1.apply(GermlineThresholdCaller.scala:76)
at org.hammerlab.guacamole.commands.GermlineThreshold$Caller$$anonfun$1.apply(GermlineThresholdCaller.scala:75)
at org.hammerlab.guacamole.DistributedUtil$$anonfun$pileupFlatMap$1.apply(DistributedUtil.scala:292)
at org.hammerlab.guacamole.DistributedUtil$$anonfun$pileupFlatMap$1.apply(DistributedUtil.scala:289)
at org.hammerlab.guacamole.DistributedUtil$$anonfun$windowFlatMapWithState$1$$anonfun$apply$5.apply(DistributedUtil.scala:368)
at org.hammerlab.guacamole.DistributedUtil$$anonfun$windowFlatMapWithState$1$$anonfun$apply$5.apply(DistributedUtil.scala:363)
at org.hammerlab.guacamole.DistributedUtil$$anonfun$collectByContig$1.apply(DistributedUtil.scala:443)
at org.hammerlab.guacamole.DistributedUtil$$anonfun$collectByContig$1.apply(DistributedUtil.scala:440)
This is specific to the GermlineThreshold
caller, which is specifically looking for specific sets of alternates and references.
} else {
throw new IllegalArgumentException(
"Multiple reference bases found in sample = %s at (chr, pos) = (%s, %d)"
.format(sampleName, samplePileup.referenceName, samplePileup.locus)
)
}
What's curious is we are not hitting the previous condition in it
case (allele1, count1) :: (allele2, count2) :: rest => {
if (allele1.refBases == Seq(Bases.N) || allele2.refBases == Seq(Bases.N)) {
log.warn("Reference base N found and ignored in sample = %s at (chr, pos) = (%s, %d)"
.format(sampleName, samplePileup.referenceName, samplePileup.locus))
// Find the non-N reference
val properReference = if (allele1.refBases == Seq(Bases.N)) allele2.refBases else allele1.refBases
variant(
Allele(properReference, Bases.ALT),
Ref :: Ref :: Nil
) :: Nil
}
I get a similar error for /hdfs/datasets/dream/data/training/normal.chr20.withMDTags.bam
, this time at 20:60823211
. I don't see any N
s there.
I think the root cause is a deletion. When I print out the two alleles for a toy BAM, I see this:
(Allele(A,A),18)
(Allele(,),4)
and this in IGV:
When I try to run
germline-threshold
onbrody_307-bcell-tumor-bwa-mem-gap11-gep4-dedup-indelrealigned-bqsr.bam
, I eventually get the following error:java.lang.IllegalArgumentException: Multiple reference bases found in sample = SM at (chr, pos) = (17, 4455004)
Full command line:
See http://demeter-mgmt1.demeter.hpc.mssm.edu:8088/cluster/app/application_1432740718700_2196
@arahuja says that this shouldn't be happening.