theislab / scCODA

A Bayesian model for compositional single-cell data analysis
BSD 3-Clause "New" or "Revised" License
141 stars 23 forks source link

Influence of number of cells in parameterization #91

Closed auesro closed 4 months ago

auesro commented 4 months ago

Hi,

This is more of a question than a bug. If I should post it somewhere else, please let me know.

What influence does it have the total number of cells in the recommended parameters to run scCODA? Is there any parameter that should be changed? Is there any lower or upper threshold to the number of cells in the dataset that scCODA should not be used?

Thanks,

A

johannesostner commented 4 months ago

Hi @auesro,

that's a great question! In general, fewer cells will lead to smaller counts in the aggregated (sample x cell type) count data. With smaller counts, the uncertainties caused by "discretizing" the true proportions will be stronger and may significantly impact the result. The Dirichlet-Multinomial distribution used in scCODA accounts for this and will have higher uncertainty for lower cell numbers, which may lead to a power loss. We briefly explored this during scCODA's development, but only noticed a significant impact on the detection power if there were less than ~300 cells per sample (assuming 10 cell types). So a few hundred cells per sample (more if you have a large number of cell types) should suffice.

auesro commented 4 months ago

Hi @johannesostner

Thanks a lot for your quick reply.

I fear I might be in a risky situation: 50 cell types, 4 samples, around 600 cells/sample ....it sounds to me like too few datapoints, right?

A

johannesostner commented 4 months ago

There might be some impact on the model's power, yes. You could aggregate some cell types to more coarse categories though (if it makes sense from a biological viewpoint).

auesro commented 4 months ago

Yea, I will try merging some categories. Thanks @johannesostner !

More basic question: where can I look to estimate the power of the model?

johannesostner commented 4 months ago

Unfortunately, we don't have any explicit power statistics on the number of cells per sample. There are some analyses regarding the relationship between number of samples and detected effect sizes in the supplement of the paper, though.

auesro commented 4 months ago

I will check them out

Thanks!