samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
277 stars 244 forks source link

htsjdk should parse multiallelic AF/AC #423

Open droazen opened 8 years ago

droazen commented 8 years ago

Requested by @ldgauthier:

It would be wonderful to be able to use SelectVariants with a query like -select "AF > 0.1" on a VCF containing multiallelics and have it filter multiallelics by the allele with the highest AF. (Possibly conversely for "AF < X"queries. Right now it crashes unless you use a crazy JEXL or pull out the multiallelics. Maybe we could make a maxAF/minAF in htsjdk/JEXLmap.java which equals AF for biallelics?

Internally, it might be nice to have a Map with the AF (or AC) for each allele for the SelectVariants issue and to simplify some of the crazy logic already in VariantAnnotator to deal with different allele ordering.

As part of this task, we should also make 100% sure that allele ordering is preserved so that AF/AC array ordering is preserved during VC reading/writing/manipulation.

The following four methods would be a great start: double VariantContext::getAF(Allele a) int VariantContext::getAC(Allele a)

double VariantContext::getMaxAF(void) //returns AF in biallelic case, probably calls getAF(allele) over all alt alleles (from existing getAlternateAlleles() fcn) in multiallelic case; potentially cache the value after we find it int VariantContext::getMaxAC(void) //returns AC in biallelic case Those would be the ones that go into the JEXLMap for SelectVariants.

Maybe similar for minimum AF/AC.

This will probably involve parsing and cacheing the AF and AC, which we don't do now. Would we want to make an array similar to List alleles where the zeroth one is for reference? Then allele indices and AF/AC indices would line up.

Maybe also something like these... ArrayList VariantContext::getAllAFs(void) //returns the cached AF list ArrayList VariantContext::getAllACs(void) //returns the cached AC list

droazen commented 8 years ago

Depends on https://github.com/samtools/htsjdk/issues/424