Open s-andrews opened 4 years ago
So both of these end up with huge numbers of reads in them. Gm20388 is huge (4.4Mbp) and spans over multiple other genes so it's probably rubbish. Cd74 is one of the most highly expressed genes (2^14 reads over it). Maybe we should set some kind of cap on reads per gene to avoid wasting tons of time analysing these few super high expressed genes. In the end we're only looking for a shift in proportion, so it's not the end of the world.
In my test dataset Cd74 and Gm20388 were still running after 14 hours, where everything else finished ages ago, so something is causing them to get stuck.