opencb / opencga

An Open Computational Genomics Analysis platform for big data genomics analysis. OpenCGA is maintained and develop by its parent company Zetta Genomics. Please contact support@zettagenomics.com for bug report and feature requests.
Apache License 2.0
165 stars 97 forks source link

Accept enriched genotypes at GENOTYPE filter. #750

Closed j-coll closed 6 years ago

j-coll commented 6 years ago

Genotype filter accepts a list of genotypes per sample. Querying multi-allelic genotypes is not always as easy, because there are multiple combinations that depends on the number of secondary alternates (0/1, 0/2, 1/2, 3/5, ...)

This operation can be pretty easy. Since #192, a list with all the loaded genotypes is stored and updated in the StudyConfiguration.

Option A

We can allow wildcards (*) as "any allele different than 0 or missing". Example:

Option B

An alternative can be accept an n instead instead of a wildcard * but with the same meaning:

Option C

There is also other alternate to accept genotype aliases, which will help with polyploidy and phased genotypes:

Conclusion

We will go for Option C, as is the most flexible solution

Tasks

imedina commented 6 years ago

Nice, let's implement the * for now.

jpflorido commented 6 years ago

Although it shouldn't be a problem, be aware that GATK represents spanning deletions as * https://gatkforums.broadinstitute.org/gatk/discussion/6926/spanning-or-overlapping-deletions-allele

imedina commented 6 years ago

That should be a problem (in theory), @j-coll actually the two options to implement are:

1/* would be part of */*

j-coll commented 6 years ago

Implementing Option C with new genotypes HOM_REF, HOM_ALT, HET, HET_REF, HET_ALT and MISS