tleonardi / nanocompore

RNA modifications detection from Nanopore dRNA-Seq data
https://nanocompore.rna.rocks
GNU General Public License v3.0
77 stars 12 forks source link

cluster_counts interpretation #210

Closed sunsetyerin closed 1 year ago

sunsetyerin commented 1 year ago

I have a question aboutcluster_counts column from nanocompore sampcomp result. I saw previous issues that cluster_counts is the number of reads assigned to each cluster. but wonder which one is number of reads and which one is number of clusters.

control_1:32/12control_2:21/9__test_1:60/23test_2:38/16

From here, control_1:32/12, 32 is the number of reads and 12 is the number of clusters?

lmulroney commented 1 year ago

Hi @sunsetyerin,

Nanocompore is limited to 2 clusters for the gmm. If 1 cluster fits the data better than 2, no further processing is done and the site is considered unmodified. If 2 clusters fit the data better than 2, the statical test (usually the logistical regression test) is performed.

From this example there are 32 reads from control 1 assigned to cluster 1 (c1) and 12 reads assigned to cluster 2 (c2). Control 2 has 21 reads assigned to c1 and 9 reads assigned to c2. Test 1 has 60 reads assigned to c1 and 23 reads assigned to c2, and test 2 has 38 reads assigned to c1 and 16 reads assigned to c2.

The basic way to read those lines is: [sample name]:[number of reads assigned to cluster 1]/[number of reads assigned to cluster 2]__repeated for each further sample.

I hope this explanation helps.