theislab / scCODA

A Bayesian model for compositional single-cell data analysis
BSD 3-Clause "New" or "Revised" License
147 stars 24 forks source link

est_fdr did not change the result #62

Closed grimwoo closed 2 years ago

grimwoo commented 2 years ago

thanks for the development of the great scCODA.

It may be my own error. I found that the result did not change when I changed the est_fdr. Could anyone help me? # # # >>> sim_results.set_fdr(est_fdr=0.05) >>> sim_results.summary() Compositional Analysis summary:

Data: 23 samples, 2 cell types Reference index: 0 Formula: Group

Intercepts: Final Parameter Expected Sample Cell Type GABA 2.843 63.188077 GLU 3.257 95.594532

Effects: Final Parameter Expected Sample log2-fold change Covariate Cell Type Group[T.Mouse] GABA 0.00000 126.398962 1.000260 GLU -1.77579 32.383646 -1.561663 # # # >>> sim_results.set_fdr(est_fdr=0.00001) >>> sim_results.summary() Compositional Analysis summary:

Data: 23 samples, 2 cell types Reference index: 0 Formula: Group

Intercepts: Final Parameter Expected Sample Cell Type GABA 2.843 63.188077 GLU 3.257 95.594532

Effects: Final Parameter Expected Sample log2-fold change Covariate Cell Type Group[T.Mouse] GABA 0.00000 126.398962 1.000260 GLU -1.77579 32.383646 -1.561663 # # # >>> sim_results.set_fdr(est_fdr=0.000000000000000000000000000000000000000000000000001) >>> sim_results.summary() Compositional Analysis summary:

Data: 23 samples, 2 cell types Reference index: 0 Formula: Group

Intercepts: Final Parameter Expected Sample Cell Type GABA 2.843 63.188077 GLU 3.257 95.594532

Effects: Final Parameter Expected Sample log2-fold change Covariate Cell Type Group[T.Mouse] GABA 0.00000 126.398962 1.000260 GLU -1.77579 32.383646 -1.561663 # # # >>> sim_results.set_fdr(est_fdr=0) >>> sim_results.summary() Compositional Analysis summary:

Data: 23 samples, 2 cell types Reference index: 0 Formula: Group

Intercepts: Final Parameter Expected Sample Cell Type GABA 2.843 63.188077 GLU 3.257 95.594532

Effects: Final Parameter Expected Sample log2-fold change Covariate Cell Type Group[T.Mouse] GABA 0.00000 126.398962 1.000260 GLU -1.77579 32.383646 -1.561663 # # # >>> sim_results.set_fdr(est_fdr=0.5) >>> sim_results.summary() Compositional Analysis summary:

Data: 23 samples, 2 cell types Reference index: 0 Formula: Group

Intercepts: Final Parameter Expected Sample Cell Type GABA 2.843 63.188077 GLU 3.257 95.594532

Effects: Final Parameter Expected Sample log2-fold change Covariate Cell Type Group[T.Mouse] GABA 0.00000 126.398962 1.000260 GLU -1.77579 32.383646 -1.561663

grimwoo commented 2 years ago

and in addition, i wonder whether one cell-type have "significant difference" but the other does not.

I only classify cells into two types: GABA and GLU . If GLU have significant decrease, the GABA should have significant increase. however, the results only show one:

       Final Parameter     Expected Sample       log2-fold change

GABA 0.00000 126.398962 1.000260 GLU -1.77579 32.383646 -1.561663

johannesostner commented 2 years ago

Hi @grimwoo,

scCODA's results are based on the fact that single-cell counts are compositional, meaning that the total count per sample is not important and only the proportions are of interest. Therefore, only relative results with respect to a reference (like "cell type x changes in abundance with respect to the abundance of cell type y") are possible. Thus, we always need one cell type that we consider to be the reference - this cell type can not have a significant change in abundance. The effects of all other cell type will be with respect to this reference.

In your example, GABA is the reference (reference index 0 means the 0-th cell type was selected), and cannot decrease or increase with respect to itself. The decrease in GLU cells is with respect to the count of GABA cells. If you look at the "expected sample" and "log2-fold change" columns in the result, which assume that the total count per sample stays the same between the conditions, you can see that the decrease of GLU automatically leads to an increase in GABA (in terms of proportions), since the total count per sample must be constant.

johannesostner commented 2 years ago

Regarding the FDR:

Changing the expected FDR only changes which cell types are selected to be credibly changing, not the effect sizes. I'm surprised that you still get a credible effect with FDR=0. Could you please tell me which version of scCODA you use? Also, could you please send me the extended summary of your run? You can find it by calling sim_results.summary_extended()

grimwoo commented 2 years ago

Hi, @johannesostner

Thanks for your quick reply and your explanation on reference cell type. And I wonder if I have no reference cell type, how could I still use scCODA. # Regarding the FDR: My scCODA version is 0.1.8.

And the result of sim_results.summary_extended() is as following: Compositional Analysis summary (extended):

Data: 23 samples, 2 cell types Reference index: 0 Formula: Group Spike-and-slab threshold: 1.000

MCMC Sampling: Sampled 20000 chain states (5000 burnin samples) in 263.393 sec. Acceptance rate: 62.8%

Intercepts: Final Parameter HDI 3% HDI 97% SD Expected Sample Cell Type GABA 2.838 1.924 3.721 0.487 63.188077 GLU 3.252 2.358 4.166 0.487 95.594532

Effects: Final Parameter HDI 3% HDI 97% SD Inclusion probability Expected Sample log2-fold change Covariate Cell Type Group[T.Mouse] GABA 0.000000 0.000 0.00 0.000 0.0 126.695139 1.003637 GLU -1.787318 -2.201 -1.33 0.228 1.0 32.087469 -1.574918

# The following is my original data, if you are interested in reproducing this analysis: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

  | Time | Group | GABA | GLU -- | -- | -- | -- | -- 1 | SB.E10 | FMouse | 38 | 52 2 | SB.E11 | FMouse | 52 | 73 3 | SB.E12 | FMouse | 202 | 405 4 | SB.E13 | FMouse | 13 | 11 5 | SB.E14 | FMouse | 54 | 130 6 | SB.E15 | FMouse | 2 | 21 7 | SB.E16 | FMouse | 387 | 307 8 | SB.E18 | FMouse | 255 | 379 9 | SB.P14 | FMouse | 214 | 274 10 | SB.P4 | FMouse | 23 | 54 11 | SB.E10 | Mouse | 3 | 1 12 | SB.E11 | Mouse | 120 | 24 13 | SB.E12 | Mouse | 47 | 10 14 | SB.E13 | Mouse | 112 | 21 15 | SB.E14 | Mouse | 93 | 31 16 | SB.E15 | Mouse | 20 | 9 17 | SB.E16 | Mouse | 53 | 18 18 | SB.E18 | Mouse | 37 | 8 19 | SB.P14 | Mouse | 46 | 7 20 | SB.P4 | Mouse | 6 | 1 21 | SB.P45 | Mouse | 11 | 3 22 | SB.P45X | Mouse | 13 | 0 23 | SB.P8 | Mouse | 10 | 2

johannesostner commented 2 years ago

Thank you! Now I see what happens.

The inclusion probability for the GLU cells is 1, meaning that the model is absolutely certain that this cell type is differentially abundant. Looking at the data, this makes sense: In every "Mouse" sample, the abundance of GLU cells is lower than in the lowest "FMouse" sample. With such a perfect separation, there's no doubt that the cells are differentially abundant (see boxplot with added dots for every data point):

issue_62

This is why even at FDR=0, GLU cells will be selected. My initial surprise came from the fact that we never encountered a perfectly separated dataset (and thus inclusion probability of 1) before. This is all working as intended.

grimwoo commented 2 years ago

Thank you for your detailed reply!