opencb / hpg-bigdata

This repository implements converters and tools for working with NGS data in HPC or Hadoop cluster
Apache License 2.0
17 stars 14 forks source link

Implement OR logical behaviour when selecting alignments from multiple regions in the alignment convert CLI #74

Closed jtarraga closed 7 years ago

jtarraga commented 8 years ago

Currently, when executing the 'alignment convert' command line selecting alignments from multiple regions (i.e.: using the parameters --regions or --region-file). The AND logical operation is applied between all regions (that is, the alignments located in the intersecting spot of all regions will be returned, in general, 0 alignments because regions hardly intersect).

./build/bin/hpg-bigdata-local2.sh alignment convert -i /tmp/test.bam -o /tmp/out.avro --region 1:229454177-229654865,1:151870909-151882308

The default behaviour should be as an OR operation in order to return alignments located in any of the input regions, in our example, alignments located on 1:229454177-229654865 OR on 1:151870909-151882308.

jtarraga commented 8 years ago

We have the same situation with the 'variant convert' command line. We had to find a 'general' solution for both cases, for that we have to modify the parent class AvroSerializer and updating the children classes: AlignmentAvroSerializer and VariantAvroSerializer.