Closed hoeflerb closed 11 months ago
Hi @hoeflerb,
thanks for your interest in scCODA! Your intuition about scCODA having a GLM-like covariate structure is correct. The effects (called "Final Parameter" in the output table) can be combined additively, if you set them like Treatment + Donor
in your formula
parameter. For interaction parameters, the same thing holds in theory. We never explicitly tested this in practice, though.
Calculating the expected sample and log-fold change is a little bit more involved though due to the compositionality of the data. We assume the data to be Dirichlet-Multinomially distributed (y ~ DirMult(a, N)
) with proportions log(a) = α + βtreat1 + βdonor2
, which causes non-additivity of the mean wrt a
.
Therefore, I'd suggest to calculate the expected sample (and subsequently the log-fold change) individually for each scenario. The expected sample is the mean of a Dirichlet-Multinomial distribution with N
(the mean number of cells per sample in your data) draws and distribution a
(exp of thee sum of your effect vectors of all covariates of interest, as above), i.e. E(sample) = N*(a/sum(a))
. The log-fold change is then the change of this expected sample compared to the control group with only the intercept.
As for model comparisons, we never looked into this specifically, but you could compare different models through sth like WAIC. Be careful with the metric for determining the goodness of fit though, as the data is non-normal!
Thanks Johannes for the detailed response! I ran a quick mock calculation using your instructions and was able to reproduce the log2-fold change values reported in the results table, so this makes sense now.
For the WAIC calculation, the built-in arviz function requires log likelihoods to be stored in the InferenceData object, so I'm not sure how to go about obtaining those. I will open a new issue for that question.
Hi,
Thanks for scCODA, great approach to performing compositional analysis. A question about the output tables. If we fit a model with two categorical covariates, with a GLM the additive effect would be determined by summing the coefficients, ex:
For scCODA, is it correct to sum the log2-fold change terms to determine the additive effects (ie: Treatment1,Donor2 effect = LFCtreat1 + LFCdonor2)? I'm assuming that because the model uses a log-link function that this is valid, but I want to make sure I am thinking about this correctly.
Similarly, if we fit a model with an interaction term, can the effect of the interaction be determined in the same way (ie: Treatment1,Donor2 effect = LFCtreat1 + LFCdonor2 + LFCtreat1:donor2)? In other words, can we treat the log2-fold change values in the same way as (log2 (α + β)/α) of a GLM?
Also, is there a way to do model comparisons with scCODA (maybe using WAIC)? If we want to know whether the model fits better with or without an interaction term, how would we determine this?