sgkit-dev / sgkit

Scalable genetics toolkit
https://sgkit-dev.github.io/sgkit
Apache License 2.0
234 stars 32 forks source link

Add `get_allele_type` function #84

Open eric-czech opened 4 years ago

eric-czech commented 4 years ago

We'll need something like this to segregate indels, MNPs or larger structural variants from snps in certain analyses, and it could start by simply defining an is_snp function. That would be enough for most GWAS and it could be a building block for a larger get_allele_type function. The Hail is_snp function would be a good guideline as would allele_type.

hammer commented 4 years ago

Is there a standard nomenclature or ontology we can use for the possible values to return from this function? It's probably fine to just adopt the Hail terminology but I wonder if there's any value for us in aligning with the Sequence Ontology's sequence_alteration terms, or some other controlled vocabulary?

hammer commented 4 years ago

In particular my eye twitches a little bit when people use SNP when they mean SNV or point_mutation.

jeromekelleher commented 4 years ago

I actually didn't know that SNP had a definition tied to frequency - learn something new every day!

I agree it would be good to align with the sequence ontology vocab once it's not too cumbersome.