mutalyzer / website

0 stars 0 forks source link

processing of repeats that cross exon boundaries #19

Open maglott opened 1 year ago

maglott commented 1 year ago

I submitted NM_001042492.3:c.7317AGC[1] and chromosome descriptions of NC_000017.11(NM_001042492.3):c.7320_7322del NC_000017.11:g.31349250_31350183del were returned on GRCh38. Is the latter representation expected? Seems more likely that only 3 nucleotides were deleted from the genome to result in the lost of one repeat unit. Why is the projection across the intron returned?

jfjlaros commented 1 year ago

This is an interesting one.

The RefSeq transcript contains a tri-nucleotide repeat with two repeat units. The last nucleotide of the second repeat unit resides in a different exon than the rest of the repeat.

Because a RefSeq transcript is used, the application of the 3' rule results in NM_001042492.3:c.7320_7322del. When this description is mapped to a genomic build, the deletion spans an intron.

If on the other hand, a "genomic transcript" would have been used (e.g., GRCh38(NM_001042492.3):c.7317AGC[1]), the reference sequence does not contain a repeat (there is now an intron in between) and therefore the description is normalised to NC_000017.11(NM_001042492.3):c.=. The desired deletion could have been described as GRCh38(NM_001042492.3):c.7317_7319del, which is normalised to NC_000017.11(NM_001042492.3):c.7319_7321del.

So, this is expected behaviour. However, this example shows that a mapping from a RefSeq transcript to a "genomic transcript" can sometimes be done in multiple ways. I think it would be good if Mutalyzer could at least detect these situations and report on them.