opencb / hpg-bigdata

This repository implements converters and tools for working with NGS data in HPC or Hadoop cluster
Apache License 2.0
17 stars 14 forks source link

Update Avro converters for Variant and Alignment #43

Closed jtarraga closed 8 years ago

jtarraga commented 8 years ago

Update Avro converters and move them to the hpg-bigdata-core package (currently, they are in the hpg-bigdata-app package). In addtion, current implementation converts the whole file from VCF/BAM/SAM to Avro. It would be nice if some specific classes for Variant and Alignment exist with basic filters using lambdas:

VariantAvroSerializer avroSerializer = new VariantAvroSerializer(); avroSerializer.addRegionFilter(new Region("1", 1, 800000)) .addRegionFilter(new Region("1", 798801, 222800000)) .addFilter(v -> v.getStudies().get(0).getFiles().get(0).getAttributes().get("NS").equals("60")); avroSerializer.toAvro(inputFilename, outputFilename);