theislab / scCODA

A Bayesian model for compositional single-cell data analysis
BSD 3-Clause "New" or "Revised" License
147 stars 24 forks source link

which condition scCODA can work with? #29

Closed FADHLyemen closed 3 years ago

FADHLyemen commented 3 years ago

HI, can only external condition should be consider as condition in scCODA model or should also use control-case status as a condition? can also I use continuous variable such as treatment exposure time as a condition or scCODA only accept categorical variable?

Thank you

johannesostner commented 3 years ago

Hi! I am not 100% certain what you mean with an "external condition", but scCODA can handle any binary covariates/conditions. From the statistical perspective, it does not matter whether this is case vs. control, healthy vs. infected, alive vs. dead, ...

In theory, you can also use continuous covariates with scCODA. The formula works the same way as a lm linear model in R, so you can also use multiple conditions. See the beginning of the advanced tutorial for this. We do not advertise using multiple conditions or continuous covariates with scCODA though, since we have not done a performance comparison with other methods for it yet.

Also, scCODA can not model the self-correlation structure of time series data, where each time point depends on the previous ones. Therefore, I would only use time as a variable, if the measurements were taken from independent sources (not the same patient/sample at multiple time points)

FADHLyemen commented 3 years ago

by external condition, I meant the treatment. if your condition is a case control status, I don't know which question will answer: Do celltypes increase/ decrease in case compared with the control? "We do not advertise using multiple condition" so if my samples are under multi-treatment, can I setup the formula as cond1+cond2 or still under development?

"since we have not done a performance comparison with other methods for it yet." do you recommend binarized it current time?

"I would only use time as a variable," but still not recommended by you, is it true?

Thank you

johannesostner commented 3 years ago

by external condition, I meant the treatment. if your condition is a case control status, I don't know which question will answer: Do celltypes increase/ decrease in case compared with the control?

Exactly. If you have a control and a case (disease, or similar) group, you can ask "Which cell types change with the disease?"

"We do not advertise using multiple condition" so if my samples are under multi-treatment, can I setup the formula as cond1+cond2 or still under development?

You can do this, and will receive effects for each condition, compared to the samples where this condition is absent, just as in the tutorial I linked. From a mathematical standpoint, nothing stands against this type of analysis with scCODA, and we tried this out on some example data already, with promising results. However, we have not gathered performance metrics for this, so we can't guarantee that multiple treatment analysis will always produce usable results. Alternatively, you can look at each condition/treatment in a separate model.

"since we have not done a performance comparison with other methods for it yet." do you recommend binarized it current time?

Binary variables are what we showed all our results for. You could set up a binarized version of your variable and compare the difference that binarization makes in scCODA, compared to using the continuous variable. That's what I would do (just as a sanity check).

"I would only use time as a variable," but still not recommended by you, is it true?

Exactly. If you have discretized time points, you could compare them pair-wise (time0 vs. time1; time1 vs. time2; ...). This is something that scCODA is able to do.

FADHLyemen commented 3 years ago

Thank you, I went through your advanced example, "Patsy allows us to automatically handle categorical covariates, even with multiple levels. For example, we can model the effect of all three diseases at once:"

what I want is different: my covariate_df has two columns which represent different combination of drugs for the same patients. each column has two levels ['YES','NO']. In you example, it is three levels for only one condition. but I have two variables and two levels for each variables. Do you think scCODA can only handle one variable and not take care of combination of variables?

johannesostner commented 3 years ago

I see, your setup is slightly different.

In that case, I'd subset the data into four groups (control; treatment1; treatment2; treatment1+2) and look at the pairwise comparisons (control-treatment1, control-treatment2, control-treatment1+2, ...) that you are interested in.

Alternatively, you can use formula = "cond1*cond2" to model the effect of both treatments and their interaction in one run of scCODA, just like in a linear regression model. Once again, I am pretty confident that scCODA is capable of performing such an analysis, but I can't guarantee that it will work due to our lack of experiences with multiple variables in one model.

johannesostner commented 3 years ago

Yes, they can actually be accessed from within scCODA, through the sccoda.model.other_models module. We also have a small overview with links to the respective publications here.

Since the manuscript focuses on differential abundance testing (one binary condition), we only compared scCODA to methods that are commonly used for this task. So you might need to resort to pairwise testing of conditions for some methods, too.

FADHLyemen commented 3 years ago

This is very helpful, for this option: formula = "cond1*cond2" it helps if you can create jupyter notebook for this option.