When we create a VariantData class, we specify ancesteral_allele as an array of strings. However, when we return the sites_ancestral_allele array, it is a numpy array of indexes into the alleles list. It's a bit confusing to use "ancestral_allele to mean a string in one context, and a numerical index in another
When devising the SampleData class, we took care to describe the ancestral strings as "ancestral states", and used "ancestral allele" to refer to an index into the alleles list (there is some discussion about this on GitHub somewhere, but I can't dig it up). Should we therefore rename the second argument of VariantData(...) to ancestral_state? This would match the tskit terminology, which is nice (but note that the VCF info fields tend to use AA as an abbreviation for "Ancestral Allele", referring to a string, so perhaps we can't win).
When we create a VariantData class, we specify
ancesteral_allele
as an array of strings. However, when we return thesites_ancestral_allele
array, it is a numpy array of indexes into the alleles list. It's a bit confusing to use "ancestral_allele to mean a string in one context, and a numerical index in anotherWhen devising the SampleData class, we took care to describe the ancestral strings as "ancestral states", and used "ancestral allele" to refer to an index into the alleles list (there is some discussion about this on GitHub somewhere, but I can't dig it up). Should we therefore rename the second argument of
VariantData(...)
toancestral_state
? This would match the tskit terminology, which is nice (but note that the VCF info fields tend to useAA
as an abbreviation for "Ancestral Allele", referring to a string, so perhaps we can't win).