Closed fgfrost closed 1 year ago
Hi @fgfrost,
Your question about the cluster counts is really close to this already answered issues #210. But here is excerpt from that answer to help with understanding how to read the cluster counts line "The basic way to read those lines is: [sample name]:[number of reads assigned to cluster 1]/[number of reads assigned to cluster 2]__repeated for each further sample." - #210
The lor is indeed the log odds ratio, here is a quote from our recent protocol paper to explain the lor "The absolute value of LOR 0.5 means that the probability of a read from one condition is ∼3 times more likely in one cluster than a read from the other condition. A larger absolute value of LOR means that reads from one condition are more heavily enriched in one cluster, and an absolute value of LOR closer to 0 means that the two sample labels are more evenly distributed between the two clusters." Mulroney et al Current Protocols (2023) https://doi.org/10.1002/cpz1.683
Thank you! That's very helpful. So my understanding is that there is no inherent assignment between clusters and modification status, i.e. cluster 1 does not inherently mean modified or unmodified. Is that correct? and if that's the case does cluster 1 (whether unmodified or modified) correspond to the same modification state in each sample?
Hi @fgfrost,
The cluster number is completely random from position to position and does not correspond to modified or unmodified state.
Cheers, Logan
Got it, thank you so much for answering all of my questions!!
Hi @fgfrost,
Glad I could help, Logan
Hi, I ran sampcomp on some example data I got from GEO and I have a few questions about the output that I can't find answers for in the documentation.
Here's some sample output for reference:
My questions are about the
cluster_counts
andLogit_LOR
fields, namely, what specifically do they describe? More specifically:cluster_counts
field? I want to get an idea of the modified bases and unmodified bases at a given position, and I think this is the field that conveys that, put sometime the numerator is greater than the denominator. So does that mean the numerator is modified bases and the denominator is unmodified bases?Logit_LOR
stands for log odds ratio, what is this odds ratio specifically? does it convey ratio of modification, or is it simply another confidence statistic similar to the p value?