hammerlab / guacamole

Spark-based variant calling, with experimental support for multi-sample somatic calling (including RNA) and local assembly
Apache License 2.0
84 stars 21 forks source link

Move likelihood code to breeze #568

Closed arahuja closed 8 years ago

arahuja commented 8 years ago

Reviving an old branch here that moved the likelihood calculations to breeze


This change is Reviewable

coveralls commented 8 years ago

Coverage Status

Coverage increased (+0.06%) to 75.256% when pulling b3b78fd94f876c89eabc54275cd5a14815234004 on breeze-likelihood into e3c276d86c79630dbbc8a99939ad12f8817fb3b6 on master.

e5c commented 8 years ago

Reviewed 1 of 2 files at r1. Review status: 1 of 2 files reviewed at latest revision, 3 unresolved discussions.


_pom.xml, line 281 [r1] (raw file):_

      <groupId>org.scalanlp</groupId>
      <artifactId>breeze_${scala.version.prefix}</artifactId>
      <version>0.11.2</version>

OOC any reason we use 0.11.2 over 0.12 (latest stable release)?


src/main/scala/org/hammerlab/guacamole/likelihood/Likelihood.scala, line 61 [r1] (raw file):

    val result = likelihoodsOfGenotypes(
      elements,
      Array(genotype),

per Slack discussion, maybe change these to Vector?


src/main/scala/org/hammerlab/guacamole/likelihood/Likelihood.scala, line 88 [r1] (raw file):

    normalize: Boolean = false): Seq[(Genotype, Double)] = {

    val alleles = pileup.distinctAlleles.filter(allele => allele.altBases.forall((Bases.isStandardBase _)))

Since it's been eliminated, OOC what was the original purpose of this filter fn?


Comments from Reviewable

arahuja commented 8 years ago

Review status: 1 of 2 files reviewed at latest revision, 3 unresolved discussions.


pom.xml, line 281 [r1] (raw file):

Previously, e5c (Eliza Chang) wrote… > OOC any reason we use 0.11.2 over 0.12 (latest stable release)? >

No, I don't think so, just hasn't been updated since. I will update this as well


src/main/scala/org/hammerlab/guacamole/likelihood/Likelihood.scala, line 61 [r1] (raw file):

Previously, e5c (Eliza Chang) wrote… > per Slack discussion, maybe change these to Vector? >

This is just creating a one element array so it shouldn't make a difference really


src/main/scala/org/hammerlab/guacamole/likelihood/Likelihood.scala, line 88 [r1] (raw file):

Previously, e5c (Eliza Chang) wrote… > Since it's been eliminated, OOC what was the original purpose of this filter fn? >

Ah, good question, perhaps I should not remove it. I think it was to only evaluate Genotypes with proper bases, so should be left as-is.


Comments from Reviewable

coveralls commented 8 years ago

Coverage Status

Coverage increased (+0.03%) to 75.231% when pulling 45180fa83f0971193fe35321b656675e5453a6a1 on breeze-likelihood into e3c276d86c79630dbbc8a99939ad12f8817fb3b6 on master.

e5c commented 8 years ago

Review status: 0 of 3 files reviewed at latest revision, 2 unresolved discussions.


_src/main/scala/org/hammerlab/guacamole/likelihood/Likelihood.scala, line 88 [r1] (raw file):_

Previously, arahuja (Arun Ahuja) wrote… > Ah, good question, perhaps I should not remove it. I think it was to only evaluate Genotypes with proper bases, so should be left as-is. >

In what case would a non-proper base make its way into a Genotype?


Comments from Reviewable

arahuja commented 8 years ago

src/main/scala/org/hammerlab/guacamole/likelihood/Likelihood.scala, line 61 [r1] (raw file):

Previously, arahuja (Arun Ahuja) wrote… > This is just creating a one element array so it shouldn't make a difference really >

An N base or an empty string could be there


Comments from Reviewable

e5c commented 8 years ago

Reviewed 3 of 3 files at r2. Review status: all files reviewed at latest revision, 1 unresolved discussion.


Comments from Reviewable

e5c commented 8 years ago
:lgtm:

Review status: all files reviewed at latest revision, 1 unresolved discussion.


Comments from Reviewable

arahuja commented 8 years ago

Thanks for the review @e5c!

ryan-williams commented 8 years ago

sorry for late review, a few questions


Comments from Reviewable

ryan-williams commented 8 years ago

src/main/scala/org/hammerlab/guacamole/likelihood/Likelihood.scala, line 27 [r2] (raw file):

   * This considers only the base quality scores.
   *
   * @param element the [org.hammerlab.guacamole.pileup.PileupElement]] to consider

missing [ here?


Comments from Reviewable

ryan-williams commented 8 years ago

src/main/scala/org/hammerlab/guacamole/likelihood/Likelihood.scala, line 91 [r2] (raw file):


    val genotypes = (for {
      i <- 0 until alleles.size

j/w: this change intentional?


Comments from Reviewable

ryan-williams commented 8 years ago

src/main/scala/org/hammerlab/guacamole/likelihood/Likelihood.scala, line 143 [r2] (raw file):

    val depth = elements.size

    val alleleElementProbabilities = computeAlleleElementProbabilities(elements, alleles.toArray, probabilityCorrect)

seems like you should replace the .toIndexedSeq with .toArray a few lines up, and then you don't have to do this here / copy the collection twice for no reason?


Comments from Reviewable

ryan-williams commented 8 years ago

src/main/scala/org/hammerlab/guacamole/likelihood/Likelihood.scala, line 154 [r2] (raw file):

      val alleleRow1 = alleleElementProbabilities(alleleToIndex(genotype.alleles(0)), ::)
      val alleleRow2 = alleleElementProbabilities(alleleToIndex(genotype.alleles(1)), ::)
      ( //alleleRow1.aggregate(alleleRow2, Functions.plus, Functions.chain(Functions.log, Functions.plus))

any reason to leave this commented line in?


Comments from Reviewable

ryan-williams commented 8 years ago

_src/main/scala/org/hammerlab/guacamole/loci/LociArgs.scala, line 82 [r2] (raw file):_

      val loci =
        LociParser(
          Source.fromInputStream(is).getLines().mkString

weird, having a hard time convincing myself that this works before or after this change.

mkString("")ing a bunch of loci-str lines does not seem like it would work

and anyway we should probably put each line and not read the whole file in at once.

i'll investigate this separately unless you think i'm missing something


Comments from Reviewable

arahuja commented 8 years ago

_src/main/scala/org/hammerlab/guacamole/likelihood/Likelihood.scala, line 27 [r2] (raw file):_

Previously, ryan-williams (Ryan Williams) wrote… > missing `[` here? >

Sure will add in follow-on


Comments from Reviewable

arahuja commented 8 years ago

src/main/scala/org/hammerlab/guacamole/likelihood/Likelihood.scala, line 91 [r2] (raw file):

Previously, ryan-williams (Ryan Williams) wrote… > j/w: this change intentional? >

It was, but should be the same I can revert this


Comments from Reviewable

ryan-williams commented 8 years ago

src/main/scala/org/hammerlab/guacamole/likelihood/Likelihood.scala, line 91 [r2] (raw file):

Previously, arahuja (Arun Ahuja) wrote… > It was, but should be the same I can revert this >

np. intellij suggests using .indices but i can't tell if there's any good reason for it; seems probably better in general though here given the next line the parallel structure is nice so fine to leave as is


Comments from Reviewable

arahuja commented 8 years ago

src/main/scala/org/hammerlab/guacamole/likelihood/Likelihood.scala, line 154 [r2] (raw file):

Previously, ryan-williams (Ryan Williams) wrote… > any reason to leave this commented line in? >

No, removing in follow


Comments from Reviewable