theislab / scCODA

A Bayesian model for compositional single-cell data analysis
BSD 3-Clause "New" or "Revised" License
141 stars 23 forks source link

Questions about generalizability to other NGS datasets #77

Closed jolespin closed 4 months ago

jolespin commented 1 year ago

I'd like to use this for two separate datasets:

My questions:

In both of these scenarios, I'd like to measure differential abundance for components (i.e., features) within my compositions (e.g., samples or single-cells).

What would you recommend?

johannesostner commented 1 year ago

Hi @jolespin,

thank you for your questions! Regarding dataset 1, I wonder what you mean by "features". Are these gene expressions? For those, I would recommend other methods like DeSeq2 or EdgeR - their normalizations will have similar effects as accounting for compositionality, but are much more computationally efficient. If you want to use scCODA, you can either sum up all spikes and treat them as one reference component, or just use one ERCC spike as the reference - both are feasible. I would opt for the procedure that gives you a better estimate of the total sum of features in the samples. Also, your reference should be present in every sample (cell in your case)

Regarding dataset 2, we have an automatic reference selection heuristic implemented (reference_cell_type="automatic"). With this, you will get a reference that has low dispersion over your samples, which is often a good candidate. Alternatively, you can also iterate over all possible references and look at cumulative selections of taxa (like at the end of our advanced tutorial)