s3alfisc / fwildclusterboot

Fast Wild Cluster Bootstrap Inference for Regression Models / OLS in R. Additionally, R port to WildBootTests.jl via the JuliaConnectoR.
https://s3alfisc.github.io/fwildclusterboot/
GNU General Public License v3.0
22 stars 4 forks source link

Consider python port? #61

Closed amichuda closed 1 year ago

amichuda commented 1 year ago

Hi!

I'm currently doing some analysis, and I need the wild cluster bootstrap, but it's not really available in Python. Do you know of a python port somewhere? Or if not, do you know the R code I should look at that has the "meat" so to speak? What would be my quickest path to a quick and dirty implementation?

s3alfisc commented 1 year ago

Hi - there is a python implementation of the wild cluster bootstrap called clusterbootstraps, but I have never used it, so cannot attest of its quality. WildBootTests.jl has an example of how to call it from Python (via PyJulia) - that might be another convenient solution.

I was thinking of working towards a PR for statsmodels at some point in the future, most likely implementing the algorithm following MacKinnon, which is implemented in most basic form in the boot_algo3 function (in case you wanted to get started with it =) ). boot_algo2 instead follows the algorithm as laid out in the "fast and wild" paper. The "traditional brute force" WCB algo is implemented in wildboottest.cpp in probably fairly poor cpp :)

amichuda commented 1 year ago

I might just take you up on that! Let's see if I can port it over to python.

My current problem involves panel data and multiple way fixed effects, will boot_algo3 work in that context?

I use linearmodels for fixed effects regression, so it would be nice to write up a package that's agnostic to the input regression model and then make PRs in both?

s3alfisc commented 1 year ago

Hi @amichuda - I will have time to work on an initial python implementation of the MacKinnon algorithm at the beginning of October, when I'll have two weeks off and plans to push my open source projects a little. Would that timeline fit you, or is there more urgency in your project? And yes, fixed effects should eventually be supported, and I agree to aim to make the algo work with both linearmodels and statsmodels. =)

amichuda commented 1 year ago

Yes, that timeline might work, but I'm also game to help in the coding. Would this be essentially a port of boot_algo3?

If you're game to work together, do you want to make a new package repo or just add it into this one?

Let me know!

s3alfisc commented 1 year ago

Very cool! I would suggest to start a new repo and to develop pybootest/pywildclzsterboot over there. Please go ahead with creating it!

I will send you my "debugging scripts" for fwildclusterboot later today - they should help you to compute all the input variables of the 'boot_algo' functions. Via rpy2 or reticulate, it should be easy to compute all of these in R and pass them to your python session, which (I believe) should help you to get started with the implementation - this way, you can can in principle run the python and R code side by side.

You could either try to develop the first part of a method for linearmodels that would compute all these input objects, or you could use them to start working on the actual bootstrap algorithms (or even both!😀). I believe the first would take me quite some time, while writing the algo should be fairly quick for me.

I suggest to start with boot_algo3 for the following reasons:

I have a few other design considerations, but I might share them with you via email or the new repo. Looking forward to collaborate on this!

Best, Alex

s3alfisc commented 1 year ago

Python version in development here.