s-andrews / nexons

A pipeline for quantitating transcript level abundances from nanopore sequence data
GNU General Public License v3.0
0 stars 2 forks source link

Some genes don't complete processing #1

Open s-andrews opened 4 years ago

s-andrews commented 4 years ago

In my test dataset Cd74 and Gm20388 were still running after 14 hours, where everything else finished ages ago, so something is causing them to get stuck.

s-andrews commented 4 years ago

So both of these end up with huge numbers of reads in them. Gm20388 is huge (4.4Mbp) and spans over multiple other genes so it's probably rubbish. Cd74 is one of the most highly expressed genes (2^14 reads over it). Maybe we should set some kind of cap on reads per gene to avoid wasting tons of time analysing these few super high expressed genes. In the end we're only looking for a shift in proportion, so it's not the end of the world.