theislab / scCODA

A Bayesian model for compositional single-cell data analysis
BSD 3-Clause "New" or "Revised" License
147 stars 24 forks source link

Different results for same comparison, based on which group is selected as reference #86

Closed mareymessi closed 8 months ago

mareymessi commented 11 months ago

Dear scCODA team,

I noticed that when I'm running the same model with multi-level categories (comparing 'acute', 'FU3' and 'FU12' to 'neg', always using the same reference celltype) and I'm changing the reference group from 'neg' to 'acute', I get different significant effects between the 'neg' and 'acute' group, depending on which of them is used as the reference.

When subsetting the object to the two groups of interest, I see no difference in effects between using different reference levels. So I guess it is somehow related to the presence of the other groups? Am I doing something wrong or is this expected behaviour? Could you explain why this makes a difference?

(see example below)

Many thanks! Marey

Example result: 1) reference = neg

image image

Example result: 2) reference = acute

image image

-> fewer sig. effects when using 'acute' as reference

The boxplot shows some of the cells for which differing effects are observed, I noticed that some have very low abundancies in one of the groups, however not all of them (like Goblet cells) image

I'm using scanpy v.1.9.3 and scCODA v.0.1.9

johannesostner commented 11 months ago

Hello Marey,

Looking at the results, I see that the significant effects for "acute" as the reference are a true subset of the ones that were selected with "neg" as the reference. The effects that differ in significance between the runs likely have an inclusion probability that is right at the edge of being selected - if you slightly increase the FDR in the second run, those effects will show up (you can also check this in the extended summary). The low abundance of some cell types should not impact that. When scCODA is performed on multiple groups (or with multiple covariates), the number of parameters to infer increases. Therefore, the problem is much harder to solve and results will be a lot more volatile. That's why your result is stable for the subset of two groups, but differs for 4 groups - the model has way more options to experiment with including/removing some effects and will be less certain, especially when the sample size is low compared to the number of cell types/conditions. Judging from the boxplots, the "acute" condition seems to be very different from the other three, which are quite similar. Therefore, I guess there are a lot more significant effects for "FU_3" and "FU_12" when using "acute" as the baseline than with "neg". Since each included effect will make it harder to include another one, the number of selected effects for "neg" (with "acute" baseline) will be smaller than comparing the other way around (where mostly effects for "acute" are selected).

I hope that this clears up some of your questions.

mareymessi commented 11 months ago

Yes that explains it a bit, thank you!