Closed ryan-williams closed 8 years ago
Looks good to me, not sure InputFilters
is the right space. ReadFilters
is probably a better name for what InputFilters
is now and then could be other Input
based CLI parameters
Yea, that makes sense, I decided to just rename it InputConfig
for now. We are already using Input
to imply "one sample's worth of reads from a file", and this is now a slightly-broader-than-just-filtering set of configuration parameters relating to how those are loaded.
I recently observed a BAM in the wild to have 3-4x the number of reads per unit disk-space as we usually see, resulting in each partition having to do way more work than usual, then GC+spillage and extreme performance degradation.
This is a temporary hook to alleviate this, though plumbing some logic deeper into hadoop-bam is probably a more correct long-term solution; I'll be looking into that approach shortly.