theislab / scCODA

A Bayesian model for compositional single-cell data analysis
BSD 3-Clause "New" or "Revised" License
141 stars 23 forks source link

Using continuous variables yield weird results. #96

Open Marwansha opened 1 month ago

Marwansha commented 1 month ago

Hi,

I'm experiencing issues with scCODA when analyzing the effect of a continuous covariate (CMV serostatus) as opposed to a categorical one. When I categorize CMV serostatus as positive/negative based on a threshold, the results align with known effects on cell types. However, using continuous values yields nonsensical results, showing a credible effect on all cell types with odd final parameters and LFC that don't reflect the actual effect.

Below are examples of the CMV covariate analysis. All cell types, except the reference, appear to be true with strange final parameters and LFC. image

When I categorize the same data based on the threshold, the results make sense and match a simple linear regression analysis. I also checked the effect of another continuous variable, age, and the results were similarly illogical. But when I divided the ages into four categorical groups, the output was normal and consistent with known age effects on cell proportions.

johannesostner commented 1 month ago

Hi @Marwansha! What range is your covariate in? You should make sure that your continuous covariates are normalized before applying scCODA

Marwansha commented 1 month ago

Thanks a lot for your response, @johannesostner

Actually it's not a matter of normalisation ,For example checking the effect of age( in years) as a continuous variable , I observe same thing. positive credible effect on all celtypes with non sence LFC and final paramters, results from using age as categorical with CODA , or regression from categorical and continous age covariate, match perfectly the 3 of them. its just using any continuous covariate in scCODA, so i was wodnering if there is a specfic way to add continosu variables in the formula maybe??

I don't mind sharing the data too by email if you would like

Thanks a lot in advance

johannesostner commented 1 month ago

Hmmm, that's strange. Are you using only the continuous covariate, or are you including it in a formula together with other covariates? Could you also maybe share the extended summary output when using a continuous covariate? Either as a screenshot or per email