Comparison to Wilcox - Githubissues

EST09 commented 7 months ago

Hi,

Thank you so much for the CyTOF workflow. I've been following it for my CyTOF analysis. However, I've been asked by my PI why we don't just use a Wilcox test for differential abundance analysis. I don't have a heavy stats background and whilst all the points in the article make sense, I can't really formulate a good answer, which is why I was wondering if I could ask here?

Best wishes, Emily

SamGG commented 7 months ago

Hi, I am just an advanced user. I don't know your exact knowledge in data analysis. Sorry if my reply does not fit your question.

Do you apply diffcyt using R or using a commercial software? How many groups aka conditions to compare? How many individuals per group? How many clusters in the experiment?

edgeR flavor of diffcyt allows to setup more complex design than 2 groups comparison. edgeR comes from seqRNA methods. It allows to share some information between clusters. diffcyt returns a p-value (aka FDR or p.adjust) adjusted for multiple tests (i.e. one test for each clusters).

If we consider Wilcoxon, we have to apply it on frequencies, not on counts. Each Wilcoxon test is ignoring the other tests applied on the other clusters. We have to adjust the p-values to cope with multiple testing. If there are not enough individuals in each group, the p-values might be not strongly significant and will seriously increase (less significant) during adjustment.

In my experience, we get stronger adjusted p-values with diffcyt. Adjusted Wilcoxon might be considered if more than 15 individuals per group.

Best.

EST09 commented 7 months ago

Hi,

Thanks for your help - really appreciate it!

I've been using R with my pre-determined groups and the diffcyt GLMM (DA_GLMM) package. For my main analysis, I have 3 groups (I just do them pair wise), with 9, 34, 28 samples respectively. The largest number of clusters are 12.

Please could I just check I'm understanding what you said? I've added a random intercept for samples, would this be what you mean by the considering other tests part or is there something more to it ("Each Wilcoxon test is ignoring the other tests applied on the other clusters.")? And please could I just check why applying on counts rather than frequencies is advantageous - is because of the uncertainty in frequencies if the population is small which using counts gets around?

Thank you so much - I really appreciate your help here!

Best wishes, Emily

markrobinsonuzh commented 7 months ago

I agree with most of what @SamGG has said.

The two main things for me are: 1) if you need random effects, then there is not really an option for that in the Wilcoxon test (it depends what exactly model you are using); 2) counts versus proportions: indeed, the count modeling takes into account the variability associated with the direct measurements -- the counts! -- whereas analysis on the proportions ignores this.

Random effects aside, as Samuel says, in large samples .. i.e. law of large numbers .. the counts-versus-proportions part becomes less of an issue.

I think in either case, with your 12 clusters, you'd be applying some kind of multiple testing correction, but I think that is applied independently anyways.

Cheers, Mark

SamGG commented 7 months ago

For DA_GLMM, we add such a random variable (either FCS or sample). This is the design part of the analysis. The other aspect provided by diffcyt is the adjustment for multiple testing.

Each Wilcoxon test is ignoring the other tests applied on the other clusters.

Let's switch to ANOVA to explain what I mean. When performing an ANOVA comparing 5 groups, once the p-value of the ANOVA is < 5%, we have to carry multiple tests to identify which group differs from the control group (this design is my arbitrary choice). We can either do 4 t tests comparing each group against control or apply a Dunnett procedure. When using t test, each t test and p-value ignores that it is part of 4 tests: no multiple test correction. When using Dunnett, the reported p-values are adjusted by the fact of performing 4 tests. The same correction is applied when all groups are compared 2-by-2. Look at thess articles for a better explanation https://www.nature.com/articles/nmeth.3005 and https://www.nature.com/articles/nmeth.2900. When we are dealing with multiple clusters, we cannot act if there is only one cluster under consideration. We must adjust p-values for multiple-testing.

And please could I just check why applying on counts rather than frequencies is advantageous - is because of the uncertainty in frequencies if the population is small which using counts gets around?

Sorry if misleading. When using all diffcyt methods, we use counts because of the models are based on counts which is more precise. These models take into account the number of cells per sample. If we have to use Wilcoxon tests, we must compute percentage before applying the test, unless the number of cells is the same in every sample.

Hope this helps.

EST09 commented 7 months ago

Thank you both for this, it was really helpful!

lmweber / diffcyt

Comparison to Wilcox #57