mutalyzer / mutalyzer2

HGVS variant nomenclature checker
https://mutalyzer.nl
Other
98 stars 23 forks source link

Inconsistent NameChecker results for transcript-reference disagree #510

Closed fennerm closed 4 years ago

fennerm commented 4 years ago

I encountered some fairly unintuitive NameChecker behavior recently which initially lead me to generate an incorrect HGVS.

Background ALMS1 NM_015120.4 has a disagreement between the transcript and genomic reference sequences. In NC_000002.11 there are 8 GGA repeats at positions 2:73613032-73613070. In NM_015120.4 there are 9 repeats.

Expected result The recommended NM_015120.4 HGVS returned from the following two urls should be the same: https://www.mutalyzer.nl/name-checker?description=NC_000002.11%3Ag.73613050_73613070del https://www.mutalyzer.nl/name-checker?description=NM_015120.4%3Ac.54_74del

Actual result https://www.mutalyzer.nl/name-checker?description=NC_000002.11%3Ag.73613050_73613070del recommends NC_000002.11(ALMS1_v001):c.54_74del / NM_015120.4:c.54_74del. https://www.mutalyzer.nl/name-checker?description=NM_015120.4%3Ac.54_74del correctly identifies the reference disagree and recommends NM_015120.4:c.57_77del

jfjlaros commented 4 years ago

This is expected behaviour.

As you correctly point out, the two reference sequences differ, therefore HGVS descriptions of variants that lie on these reference sequences can also differ.

It is very unfortunate that the accession number of a transcript can be used in multiple ways. This is likely the source of much confusion. In your first example (NC_000002.11(NM_015120.4)) the string NM_015120.4 is used as a selector, it selects the transcript that is labeled NM_015120.4 in the annotation of NC_000002.11. In your second example the string NM_015120.4 is used as a proper accession number.

I hope this clarifies it a bit.