opencb / hpg-bigdata

This repository implements converters and tools for working with NGS data in HPC or Hadoop cluster
Apache License 2.0
17 stars 14 forks source link

Implement Parquet converters for Variant and Alignment #42

Closed imedina closed 8 years ago

imedina commented 8 years ago

Current implementation converts the whole file from Avro to Parquet. It would be nice if some specific classes for Variant and Alignment exist with basic filters using lambdas:

VariantParquetConverter parquetConverter = new VariantParquetConverter();
parquetConverter.addRegionFilter(new Region("1", 1, 800000))
                    .addRegionFilter(new Region("1", 798801, 222800000))
                    .addFilter(v -> v.getStudies().get(0).getFiles().get(0).getAttributes().get("NS").equals("60"));
parquetConverter.toParquet(is, variantCommandOptions.convertVariantCommandOptions.output);