s-andrews / nexons

A pipeline for quantitating transcript level abundances from nanopore sequence data
GNU General Public License v3.0
0 stars 2 forks source link

Use theoretical transcripts to seed transcript clusters #4

Closed s-andrews closed 2 years ago

s-andrews commented 3 years ago

At the moment nexons operates purely theoretically - it doesn't use the structures definied in the input GTF - this is purely used to find which parts of the genome are covered by the gene(s) we want to analyse.

Can we adapt the code so that we extract the annotated transcripts from the input GTF and use these to seed the collated nexons structures. That way if there is a match between a theoretical sequence and an annotated one we'll use that first and only start a new structure if there isn't a match.

We'll need to modify the parsing code to collect transcripts as well as genes, and we'll need to modify the collating code to firstly record the transcript ID of the seeded transcript, but also make sure we don't count the theoretical transcript as a real observation.

Finally, since we can now end up with clusters with no observations we'll need to add a filter to the reporting to remove these again.

laurabiggins commented 2 years ago

This has been implemented.