processing of repeats that cross exon boundaries

This is an interesting one.

The RefSeq transcript contains a tri-nucleotide repeat with two repeat units. The last nucleotide of the second repeat unit resides in a different exon than the rest of the repeat.

Because a RefSeq transcript is used, the application of the 3' rule results in NM_001042492.3:c.7320_7322del. When this description is mapped to a genomic build, the deletion spans an intron.

If on the other hand, a "genomic transcript" would have been used (e.g., GRCh38(NM_001042492.3):c.7317AGC[1]), the reference sequence does not contain a repeat (there is now an intron in between) and therefore the description is normalised to NC_000017.11(NM_001042492.3):c.=. The desired deletion could have been described as GRCh38(NM_001042492.3):c.7317_7319del, which is normalised to NC_000017.11(NM_001042492.3):c.7319_7321del.

So, this is expected behaviour. However, this example shows that a mapping from a RefSeq transcript to a "genomic transcript" can sometimes be done in multiple ways. I think it would be good if Mutalyzer could at least detect these situations and report on them.

mutalyzer / website

processing of repeats that cross exon boundaries #19