mutalyzer / mutalyzer2

HGVS variant nomenclature checker
https://mutalyzer.nl
Other
98 stars 23 forks source link

Unique constraint on transcript mappings includes gene symbol #386

Open martijnvermaat opened 8 years ago

martijnvermaat commented 8 years ago

Our transcript mappings table has a unique constraint on accession, gene, and transcript variant, and chromosome. For the most common NM_ transcript mappings, this was designed to have uniqueness on accession and chromosome (the latter because of PAR genes if I remember correctly). The gene and transcript variant are needed to also have uniqueness for NC_ transcript mappings (currently used for mitochondrial genes, e.g., NC_012920(TRNI_v001)) where one accession is used for multiple genes.

The inclusion of the gene symbol in particular may have unintended side effects.

One that I can currently think of is that gene symbol changes will not be handled as expected. When importing transcript mappings, the unique constraint is used to find the corresponding existing row to update. If none is found, a new row is inserted. Now if the gene symbol for a transcript in our mapping database changes and we do a new import containing the new gene symbol, the existing row will not be matched and kept with the old gene symbol and a second row for this transcript with the new gene symbol will be inserted. I'm not sure if that's wrong per se, but it may be unexpected (and not handled well by some parts of Mutalyzer).