mutalyzer / mutalyzer2

HGVS variant nomenclature checker
https://mutalyzer.nl
Other
98 stars 23 forks source link

Warn users of the Position Converter when entering an reference inconsistency as a variant #432

Closed Bratdaking closed 6 years ago

Bratdaking commented 7 years ago

If the position converter is used for a variant on a position with a known inconsistency between the chromosomal reference and the transcript reference, it might accidentally double the mutation. If the variant is equal to the inconsistency between both, it will try to introduce inconsistency again on the transcript sequence, and thereby double the inconsistency. This is very confusing, and laymen might not recognize the error as such. It would be very nice if users get a warning in such cases, of even better if the position converter would say something like, this chromosomal change did not result in changes on a transcript level, but that is probably much harder to implement.

Nice example is chr2:g.31805882dupG This will (incorrectly) resolve in NM_000348.3:c.88dupC (it should have been NM_000348.3:c.88 because there is no variant introduced on a transcript level). Most people will just continue, thinking that NM_000348.3:c.88dupC is correct. If they check the name in the NameChecker however something strange will occur. Besides the correct warning that it should be NM_000348.3:c.90dupC instead, it shows an extra G in the "Overview of the raw variants" in comparison to the UCSC view (UCSC shows only 2 G's, Mutalyzer 3 + the introduced G by the variant). Additionally in the UCSC the 90dupC incorrectly will be shown below the A (which is 91, because 90 is not present at all). A close observer will recognize the introduced error, however I think that most people will not recognize this as such, and will not pay attention.

Another very nice example, and a bit more complicated one, is chr5:g.72743299_72743300insGC. The introduced GC are actually already present on a transcript level. However if we copy over the suggested transcript variant to the NameChecker, it will introduce an additional GC, that is even corrected to a duplication a few bases upwards (which is a complete different variant then the one we started with, even reversing the order of the nucleotides). A close comparison between the UCSC view and Overview of the raw variants, will reveal the complete mess between the two, but that requires a real close look.

jfjlaros commented 6 years ago

The underlying cause of these problems is that the position converter does not use a reference sequence, it only converts positions.

We are currently working on full chromosome support for the name checker, which will make the position converter obsolete.