pirl-unc / mhcgnomes

Parsing MHC nomenclature in the wild
Apache License 2.0
16 stars 3 forks source link

Normalize number of digits per allele field #2

Open iskandr opened 4 years ago

iskandr commented 4 years ago

Many (but not all) non-human genes expect three digits in their first field.

It would be nice if "DLA-88*001:01" were treated as equivalent to "DLA-88*01:01", but seems to require curating a database of which genes expect how many digits in each of their first two fields.

The number seems to vary from 2 (common in older allele formats) to 4 (very rare but does happen).

iskandr commented 4 years ago

Getting this right will be pretty involved, since some genes default to 2 digits in the first field and others 3 (even in the same species). It depends on the degree of population diversity for the gene.