statdivlab / corncob

Count Regression for Correlated Observations with the Beta-binomial
102 stars 22 forks source link

v 4.1.0 implements estimating equations interpretation #155

Closed adw96 closed 10 months ago

adw96 commented 10 months ago

This PR addresses #153 to allow estimation and testing when counts W and totals M may not be integer-valued. Instead of assuming a parametric beta-binomial model, parameter estimation is now motivated by estimating equations given by the beta-binomial score equations.

This behavior must be forced (allow_noninteger defaults to FALSE) to ensure that code runs as previously.

Specific warnings/errors are given if the testing approach doesn't match up with a reasonable interpretation. At present, we encourage robust ("sandwich") standard errors when using non-integer counts. Specifically, bootstrap testing is not allowed because we currently only have implemented a parametric bootstrap, which requires sampling from a beta-binomial distribution, which doesn't make sense here without integer coercion.

I also (somewhat controversially) am deferring thinking carefully about LRTs in this setting -- thanks to @svteichman for flagging this potential issue. If you choose LRT for testing with non integer values, the code throws an error and tells you to open a GitHub issue if you really need this. (not because it's difficult to implement, but because I'm not sure whether this approach is justifiable). My intuition is that a nonparametric bootstrap approach would be, but a model based approach would not (but I would need to confirm). Open to hearing from others on this (tag @ailurophilia, of course others welcome).

Can someone confirm that we do not have a typical nonparametric/multinomial bootstrap option? Nonparametric robust Wald testing would be fine here, but I think it would be an entirely new feature. @bryandmartin may be able to answer quickly.

@svteichman would love your review. I haven't looked carefully at T1ER control for the robust tests (and you don't have to), but tests do confirm that sandwich SEs and model based SEs coincide under the correct data generating procedure, as we would expect.

Happy to add additional comments or answer questions. This is a big PR so top priority is to confirm that the output behavior doesn't change unless the new arguments are used.

Thanks all! I think this is a great extension!

bryandmartin commented 10 months ago

I might be able to answer quickly, but not totally sure - what information do you need from the model for the type of Wald test you want to do?

adw96 commented 10 months ago

Thanks so much, @bryandmartin !

My question is "Is functionality for a nonparametric bootstrap already possible with existing tests (LRT & Wald)?"

(There is functionality for a parametric bootstrap, but I'm essentially wondering about making multinomial draws from the rows of X.)

adw96 commented 10 months ago

(for @svteichman -- it looks like test-coverage is no longer running correctly... would you mind investigating? 🙏 )

bryandmartin commented 10 months ago

I see. Unfortunately, no there is not. I think the easiest option would be to clone doBoot.R into something like, e.g., doBootNP.R to resample the rows in a single iteration. With that you could take pbWald.R and pbLRT.R and either add new functions, or add a function parameter. You would just need to change the single line that uses doBoot (I think)

adw96 commented 10 months ago

Totally, yes, agree -- I wanted to confirm I hadn't overlooked its existence. THANK YOU!

adw96 commented 10 months ago

Ok we now have robust and model-based score tests, and confirmation they control T1ER 🥳

@svteichman this is ready for your review. Top priority is to confirm that the output behavior doesn't change unless the new arguments are used. Again, happy to add additional comments or answer questions.

Thanks so much!