hammerlab / guacamole

Spark-based variant calling, with experimental support for multi-sample somatic calling (including RNA) and local assembly
Apache License 2.0
83 stars 21 forks source link

rm broken .array call #590

Closed ryan-williams closed 7 years ago

ryan-williams commented 7 years ago

I was seeing task (then stage) failures in Likelihood due to the .array call I removed here:

java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Lorg.hammerlab.guacamole.variants.Allele;
        at org.hammerlab.guacamole.likelihood.Likelihood$.likelihoodsOfGenotypes(Likelihood.scala:137)
        at org.hammerlab.guacamole.likelihood.Likelihood$.likelihoodsOfAllPossibleGenotypesFromPileup(Likelihood.scala:78)
        at org.hammerlab.guacamole.commands.SomaticStandard$Caller$.findPotentialVariantAtLocus(SomaticStandardCaller.scala:166)
        at org.hammerlab.guacamole.commands.SomaticStandard$Caller$$anonfun$1.apply(SomaticStandardCaller.scala:80)
        at org.hammerlab.guacamole.commands.SomaticStandard$Caller$$anonfun$1.apply(SomaticStandardCaller.scala:79)
        at org.hammerlab.guacamole.distributed.PileupFlatMapUtils$$anonfun$pileupFlatMapTwoSamples$1.apply(PileupFlatMapUtils.scala:95)
        at org.hammerlab.guacamole.distributed.PileupFlatMapUtils$$anonfun$pileupFlatMapTwoSamples$1.apply(PileupFlatMapUtils.scala:90)
        at org.hammerlab.guacamole.distributed.WindowFlatMapUtils$$anonfun$windowFlatMapWithState$1$$anonfun$apply$1.apply(WindowFlatMapUtils.scala:65)
        at org.hammerlab.guacamole.distributed.WindowFlatMapUtils$$anonfun$windowFlatMapWithState$1$$anonfun$apply$1.apply(WindowFlatMapUtils.scala:55)
        at org.hammerlab.guacamole.distributed.WindowFlatMapUtils$$anonfun$splitPartitionByContigAndMap$2.apply(WindowFlatMapUtils.scala:141)
        at org.hammerlab.guacamole.distributed.WindowFlatMapUtils$$anonfun$splitPartitionByContigAndMap$2.apply(WindowFlatMapUtils.scala:131)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:284)
        at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Is there a reason it was needed? Any ideas why it would lead to this crashing?

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.2%) to 78.764% when pulling cfd45b51d2f1139ff6562634617f35b290ffa152 on ryan-williams:ss into f8c121bc38ec9ad07c29202415a66662b2c345bc on hammerlab:master.

ryan-williams commented 7 years ago

Yea, I've also run it in a few contexts without seeing it. I'll dig around a little more to try to understand what's happening.

ryan-williams commented 7 years ago

OK, have it basically figured out:

In any case, we wouldn't want this .array even if that bug didn't exist… ArrayOps implicitly provides .toArray on Seqs afaik and is the thing to use when that's the conversion you want, but in this case we already have an Array without an additional conversion, so the whole line can go away.

ryan-williams commented 7 years ago

As an aside: if it's abnormal or unintended that we'd ever have that alleles array be empty, that could be worth digging into further; lmk if you think that's the case!