Questions about generalizability to other NGS datasets

theislab / scCODA

A Bayesian model for compositional single-cell data analysis

BSD 3-Clause "New" or "Revised" License

141 stars 23 forks source link

Hi @jolespin,

thank you for your questions! Regarding dataset 1, I wonder what you mean by "features". Are these gene expressions? For those, I would recommend other methods like DeSeq2 or EdgeR - their normalizations will have similar effects as accounting for compositionality, but are much more computationally efficient. If you want to use scCODA, you can either sum up all spikes and treat them as one reference component, or just use one ERCC spike as the reference - both are feasible. I would opt for the procedure that gives you a better estimate of the total sum of features in the samples. Also, your reference should be present in every sample (cell in your case)

Regarding dataset 2, we have an automatic reference selection heuristic implemented (reference_cell_type="automatic"). With this, you will get a reference that has low dispersion over your samples, which is often a good candidate. Alternatively, you can also iterate over all possible references and look at cumulative selections of taxa (like at the end of our advanced tutorial)

theislab / scCODA

Questions about generalizability to other NGS datasets #77