Open jc271828 opened 1 year ago
Yeah, you can choose those options (see kb count which supports both). I haven't really seen a benefit for that though (with everything being 3' end, you can't really resolve ambiguities like you can with bulk data).
As for your question about the EM algorithm, no, that is not supported. There are many things to consider in order for such a model to work (internal polyA tracts, mapping location distribution and modeling fragments, etc.) and we're unsure of how much value we'd actually gain from fitting such models. We hope to look into it at some point though
Thank you for such a timely response! That makes sense. I guess how much benefit can be gained from developing a better-fitting model may partially depend on how "overlapping"/"adjacent" genes are in the reference genome. I'm working with C. elegans and the current version annotation I'm working with has like over 10% genes overlapping. Hmm.. so I guess I'll probably not worry about this too much for now but really look forward to seeing future workarounds on this!
Hi,
I was wondering how/if I can choose EM algorithm or the "simpler" multimapping option that distributes reads evenly across genes when using kb to count reads. And because my experiment was done using 10x Genomics technology (grabbing sequences adjacent to the polyA tail), are reads supposedly very 3' end biased? If so, I also wonder if the EM algorithm can accurately distribute reads that are mapped to Gene A's 3' end and Gene B's 5' end. As far as I'm imagining it, those reads are more "likely" from Gene A transcripts? Thanks for your time!
Jingxian