statdivlab / corncob

Count Regression for Correlated Observations with the Beta-binomial
102 stars 22 forks source link

Requesting help for modeling a design with longitudinal data #149

Closed Dario-Rocha closed 11 months ago

Dario-Rocha commented 11 months ago

Thank you for your research and publication on this kind of data, I don't work with microbiome, but with single cell RNAseq data instead, and as the study designs grow in complexity and number of samples, we are convering into classic microbiome analysis difficulties. I've been reading and exploring different packages for a while now and I've bumped into this one, which looks very promising! So, I have this study with cell abundances (counts) by cell type by timepoint (categorical, 2 points) by patient, with patients divided into responders and non responders. If I were to represent this situation, I'd go for something like (n_cells_in_cell_type | total_cells_in_sample) ~ timepoint + response + timepoint:response + (1|patient) In other words, I'd try to model the counts, considering the size of the sample the counts come from, explained by the timepoint and the response, considering that the changes in cell abundance between timepoints can be different between response levels, and considering that for each patient there is one sample at timepoint0 and one sample at timepoint1 by adding a random intercept by patient. From such a model, I'd be interested in testing: If there is an association between cell abundance at timepoint0 and response group. To do so I would test whether the coefficient of response (at timepoint0) is different from 0. If the change between timepoint0 and timepoint1 is different in responders and non responders. I'd normally check the tiempoint:response coefficient for this

As explained in #63 , this package hasn't yet implemented random effects modeling, and you've suggested to introduce this information as a fixed effect by patient (patient in this example), but I am not completely sure of how to specify this in this package. Also, as I understood from #63 , there isn't a direct test for timepoint:response, but rather a likelihood ratio test between two nested models: an aditive one and one with the interaction term. Would that be correct for this case too?

Thank you for your time and attention, single-cell RNAseq studies with this design and sufficient samples to start thinking about proper statistical methods are very recent, and the methods used are still being developed.

adw96 commented 11 months ago

Hi @cafferychen777 - please refrain from using corncob's GitHub site for advertising your software in future.

Dear @Dario-Rocha,

To include patient as a fixed effect, you would fit the model

(n_cells_in_cell_type | total_cells_in_sample) ~ timepoint + response + timepoint:response + patient

instead of

(n_cells_in_cell_type | total_cells_in_sample) ~ timepoint + response + timepoint:response + (1|patient)

When you say that there is no "direct test" for timepoint:response, could you please clarify what you mean? You can certainly test the null hypothesis that the coefficient on the interaction term is equal to zero.

I hope that helps and please don't hesitate to follow up with further questions.

Dario-Rocha commented 11 months ago

Thank you for your reply, When I say there is no "direct test" I was assuming that for some reason, to test the interaction it was more appropriate to compare nested models (additive vs additive + interaction) because that was mentioned in #63, but I clearly missunderstood the post. Testing the p-value of the interaction is better suited to my needs.

I've installed and tried to run a small trial with bbdml, but I am failing to run it properly

head(temp_data) status patient time response_bin 1 SD 263 t1 1 2 SD 263 t1 1 3 SD 263 t1 1 4 SD 263 t1 1 5 SD 263 t1 1 6 SD 263 t1 1

table(temp_data$response_bin, temp_data$status) R SD PD 0 130551 103634 147348 1 6396 2276 6428

temp_bbdml <- bbdml(formula = response_bin ~ status + time + status:time + patient, phi.formula = ~ status + time + status:time + patient, data = temp_data)

Error in if (df.residual <= 0) { : argument is of length zero

I've tried with status as the only covariable, but the error is the same. Also, if I understood correctly, the function needs dissagregated data (i.e. the response variable has to be a binary one; in contrast with other functions that accept aggregated data where the response variable is a matrix with number of successes and number of trials, or a proportion and a number of trials).

Now, another question, while looking for a beta regression with mixed effects, I found the glmmTMB package. As far as I understand, corncob allows testing for differential variance, which gives a whole new dimention to the analysis of abundances. But, when it comes down to testing for differential abundance, what would be the main differences between the beta regression implementation of glmmTMB and corncob? the upside of glmmTMB is that it is perfectly compatible with glht of multcomp, which makes testing custom hypothesis easy (sorry if this sounds as advertisement, I tought you may be interested in a package that implements mixed models betareg, since I read that you were interested in eventually implementing mixed models in corncob)

adw96 commented 11 months ago

Hi @Dario-Rocha -- it seems that we've sorted out your issue so I'm going to close.

In general, I can't provide a lot of help with modeling (I can't do this at scale), but a couple of notes

Best wishes and good luck as you grow your understanding of the methods and models for your research.

Amy