PartitionedRegions improvements

hammerlab / guacamole

Spark-based variant calling, with experimental support for multi-sample somatic calling (including RNA) and local assembly

Apache License 2.0

84 stars 21 forks source link

PartitionedRegions improvements #520

Closed ryan-williams closed 8 years ago

ryan-williams commented 8 years ago

push the PartitionedRegions abstraction out to callers, so that they can start to separate loci+read partitioning plumbing from application logic.

recapitulates some of #502 but keying off SampleIds instead of SampleNames; still collapses the VAFHistogram mixture-model into one for all samples in an app, per https://github.com/hammerlab/guacamole/pull/502#discussion_r73574358

This change is

timodonnell commented 8 years ago

Wow, there's a lot here. Left some comments. My main question is how this affects runtimes on the cluster, since it involves a lot of spark partitioning logic that I don't have a good understanding of the performance implications

Also a style point fwiw: I find it helpful for methods to have an intro describing what it does, in addition to the argument descriptions

timodonnell commented 8 years ago

Cool, thanks for the replies @ryan-williams . I took another pass, LGTM