Closed auesro closed 4 months ago
Hi @auesro,
that's a great question! In general, fewer cells will lead to smaller counts in the aggregated (sample x cell type) count data. With smaller counts, the uncertainties caused by "discretizing" the true proportions will be stronger and may significantly impact the result. The Dirichlet-Multinomial distribution used in scCODA accounts for this and will have higher uncertainty for lower cell numbers, which may lead to a power loss. We briefly explored this during scCODA's development, but only noticed a significant impact on the detection power if there were less than ~300 cells per sample (assuming 10 cell types). So a few hundred cells per sample (more if you have a large number of cell types) should suffice.
Hi @johannesostner
Thanks a lot for your quick reply.
I fear I might be in a risky situation: 50 cell types, 4 samples, around 600 cells/sample ....it sounds to me like too few datapoints, right?
A
There might be some impact on the model's power, yes. You could aggregate some cell types to more coarse categories though (if it makes sense from a biological viewpoint).
Yea, I will try merging some categories. Thanks @johannesostner !
More basic question: where can I look to estimate the power of the model?
Unfortunately, we don't have any explicit power statistics on the number of cells per sample. There are some analyses regarding the relationship between number of samples and detected effect sizes in the supplement of the paper, though.
I will check them out
Thanks!
Hi,
This is more of a question than a bug. If I should post it somewhere else, please let me know.
What influence does it have the total number of cells in the recommended parameters to run scCODA? Is there any parameter that should be changed? Is there any lower or upper threshold to the number of cells in the dataset that scCODA should not be used?
Thanks,
A