hammerlab / guacamole

Spark-based variant calling, with experimental support for multi-sample somatic calling (including RNA) and local assembly
Apache License 2.0
83 stars 21 forks source link

--loci argument conflicts with TakeLociIterator #611

Open arahuja opened 7 years ago

arahuja commented 7 years ago

The assumption for things like SomaticStandardCaller (or others that use pileupFlatMap) is that --loci controls the loci at which to examine pileups. However, when using CappedRegionsPartitioner this is not the case.

For example, if I pass --loci 20:362209-362212,20:362213-362214,20:362215-362216,20:362217-362618 where I skip loci 362212, 362214 and 362216

I get the following partitioning:

20:362209-362226=0,20:362226-362239=1,20:362239-362252=2,20:362252-362265=3,20:362265-362278=4,...

The first, 20:362209-362226=0, covers the loci that were explicitly excluded in the --loci argument and will then run pilupFlatMap at those positions.

arahuja commented 7 years ago

This can be resolved with --trim-ranges but I think that should be the default then? Otherwise, I'm not sure how TakeLociIterator differentiates between empty loci and excluded loci.

ryan-williams commented 7 years ago

Good catch, working on a fix