s-andrews / nexons

A pipeline for quantitating transcript level abundances from nanopore sequence data
GNU General Public License v3.0
0 stars 2 forks source link

Jumping coordinates #11

Open ozgegizlenci opened 1 year ago

ozgegizlenci commented 1 year ago

Some of the splice coordinates jump back to the beginning regardless of the strand direction of the gene. We need to check the extracted coordinates if they are always up for + strand or down for - strand.

Pkm example: image

Variant103 of Pkm jumps back just once at the end. We can remove last coordinate/exon.

Variant103 ENSMUSG00000032294.17 59656649:59665200-59665366:59665863-59665954:59668549-59668680:59668913-59669099:59670467-59670737:59671575-59671729:59671921-59672073:59675568-59675734:59678044-59678225:59678728-59679373:59659611

In this Variant120, splice junction coordinate jumps back then carries on for another exon. We can kick them out altogether: Variant120 ENSMUSG00000032294.17 59656649:59665200-59665366:59665863-59665954:59668549-59668680:59668913-59669099:59670467-59670737:59671575-59671729:59671921-59672073:59675568-59675734:59678044-59678225:59678728-59679373:59659611-59659575:59659611

ozgegizlenci commented 11 months ago

Louise's solution; Last exons identified by chexons will be checked backward through the exon. Nexons will check whether the identified exon has an unannotated splice acceptor within a flexibility range. For that, polyA/T strech is going to be used. If more than 60% of the sequence comprised of these pattern, it will be considered as polyA sequence and the identified exon will be removed. Remaining exons will be checked whether they have an annotated splice acceptor or they don't look like polyA sequence. Remaining exons will be accepted per transcript structure.

ozgegizlenci commented 11 months ago