uclahs-cds / package-moPepGen

Multi-Omics Peptide Generator
https://uclahs-cds.github.io/package-moPepGen/
GNU General Public License v2.0
6 stars 1 forks source link

Deletion from start codon to exon end #793

Closed zhuchcn closed 1 year ago

zhuchcn commented 1 year ago

This is a very interesting edge case caught by fuzz test. We always represent any variant in a form of substituion. For deletion, we usually include a nucleotide before the deleted sequence. For example for the transcript sequance below, a deletion of AAAAAAA is represented as a substitution from GAAAAAAA -> G.

ATGAAAAAAATTTT

But if the start of the deletion (the first A) happens to be the nucleotide after start codon, we use the end inclusion format of AAAAAAAT -> T.

So this edge case is when the last A is the end of an exon. The T then becomes the start of the next exon. Variant coordination did not handle this properly. We should still be able to handle this variant.