saeyslab / CytoNorm

R library to normalize cytometry data
33 stars 6 forks source link

Not a real issue, but rather I just don't understand how to read the testCV function results. Help? #39

Open Diana-88a opened 1 year ago

Diana-88a commented 1 year ago

A (somewhat) related question regarding the testCV function graphical results. I don't understand what I'm looking at, quite frankly.

I run it as such:

cvs <- CytoNorm:testCV(fsom, cluster_values = c(a, b), plot = true, verbose = true) where a is (I assume) the start cluster number, and b is the end cluster number (from a ridiculously low value of 3 to an equally ridiculously high value of 20). It generates two plots, and I'm not sure I understand what they are telling me.

On the first plot, the top part shows what looks like downward moving "stairs", from 3 to 20 in length (from left to right); they are all shades of blue. If I assume this means that there are no batch effects causing issues with my data, then I really don't understand the following part of the plot! This is followed on the same plot by a box of varying shades of yellow (and red) to blue with values (a 20x20 grid) and with "Original clustering" on the right. I'm not sure how to read this. I understand that anything over 1.5 (or between 1 and 1.5) shows batch effects causing potential issues?

The second plot is an 11 x 20 grid, again with numbers (there are 11 technical controls in the group tested), 20 across and 11 deep. What does this plot mean? Again, the colors range from blue to red...

A third plot is simply a combination of the two previous plots, all on the same page. This one can obviously be read if I knew how to read the other ones. I hate to be ignorant, but... :) I'm having to explain this to others at work and my google-fu appears weak. I would be grateful for any help anyone could give.

Thanks,

Diana

lsdewberry commented 5 months ago

I had this question too so I'm answering even though it's from a while ago . The bottom plot is the percentage of the control cells assigned to each of the clusters. (clusters are columns, control sample numbers are rows). If you add each of the rows you get 100. It shows the number of clusters chosen in fsom as the "nClus" entry to FlowSOM.params.

Cytonorm uses different splines for each cluster to normalize. "By applying clustering on the original data first, we made the assumption that while the measurements might have shifted between the different samples, the differences between the cell types are still bigger than the shifts caused by the batch effects. If this is the case, the percentage of cells assigned to each of the clusters should be similar across all control samples. We evaluated this by computing the percentage of cells assigned to each of the clusters for each of the samples, and computing the coefficient of variation for each of the clusters. " - from paper