add --split-size arg - Githubissues

hammerlab / guacamole

Spark-based variant calling, with experimental support for multi-sample somatic calling (including RNA) and local assembly

Apache License 2.0

84 stars 21 forks source link

add --split-size arg #610

Closed ryan-williams closed 8 years ago

ryan-williams commented 8 years ago

I recently observed a BAM in the wild to have 3-4x the number of reads per unit disk-space as we usually see, resulting in each partition having to do way more work than usual, then GC+spillage and extreme performance degradation.

This is a temporary hook to alleviate this, though plumbing some logic deeper into hadoop-bam is probably a more correct long-term solution; I'll be looking into that approach shortly.

coveralls commented 8 years ago

Coverage increased (+0.03%) to 78.934% when pulling 1de302684ad8c4f7124e66be29aed7d77c75d168 on ryan-williams:ss into 9e09a81b0cca4e6e4609a4e6993d5aea11ce3f02 on hammerlab:master.

coveralls commented 8 years ago

Coverage increased (+0.03%) to 78.934% when pulling 1de302684ad8c4f7124e66be29aed7d77c75d168 on ryan-williams:ss into 9e09a81b0cca4e6e4609a4e6993d5aea11ce3f02 on hammerlab:master.

coveralls commented 8 years ago

Coverage increased (+0.03%) to 78.934% when pulling 1de302684ad8c4f7124e66be29aed7d77c75d168 on ryan-williams:ss into 9e09a81b0cca4e6e4609a4e6993d5aea11ce3f02 on hammerlab:master.

arahuja commented 8 years ago

Looks good to me, not sure InputFilters is the right space. ReadFilters is probably a better name for what InputFilters is now and then could be other Input based CLI parameters

ryan-williams commented 8 years ago

Yea, that makes sense, I decided to just rename it InputConfig for now. We are already using Input to imply "one sample's worth of reads from a file", and this is now a slightly-broader-than-just-filtering set of configuration parameters relating to how those are loaded.

coveralls commented 8 years ago

Coverage decreased (-0.03%) to 78.866% when pulling 349b2393cba8a360166c701a34831244a3abe3fb on ryan-williams:ss into 9e09a81b0cca4e6e4609a4e6993d5aea11ce3f02 on hammerlab:master.