opencb / hpg-bigdata

This repository implements converters and tools for working with NGS data in HPC or Hadoop cluster
Apache License 2.0
17 stars 14 forks source link

Add sampleFilter for VariantDatasets #118

Closed jtarraga closed 7 years ago

jtarraga commented 8 years ago

In order to filter variant datasets by some sample information (e.g. genotypes,...), the VariantDataset API should provide the function addSampleFilter. An example of use:

addSampleFilter("GT", "1:0/0;2:0/1,1/1")

The selected variants are those whose genotypes (GT) is for sample index 1, 0/0, and for sample index 2, its genotype is 0/1 or 1/1.

A future enhancements should be allow users to specify:

  1. sample names instead of sample indexes.
  2. "DP" as first parameter to filter by read depth