Closed Theo-Nelson closed 3 months ago
Hi Theo,
Sorry for confusion, the size
column means the raw count for that read, and cluster
means the UMI count after UMI clustering, while count
means the final UMI count after filtering scattered UMIs. So count
is the final UMI count you will use. I keep the size
and cluster
for diagnosis and will remove those two columns later. The polyA
column means the existence of polyA tail for that read. As each read in the output is collapsed from a UMI cluster with multiple reads, thus the polyA
is the average. In downstream analysis I use 0.5
as the threshold to indicate if a read has polyA.
Thanks for the reminder, I will also update above illustration in the github README page.
Dear LongcellPre developers,
Thank you for your pipeline. I was curious if you had published or could provide definitions for how each of the individual columns relates as it relates to your mapping algorithm. The program outputs size, cluster, count, and polyA, which seem to be overlapping categories of increasing 'expansiveness.' It also seems as though there could be fractional counts while the size / cluster values seem to be integers.
Thank you very much!
Sincerely, Theo