ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

fwildclusterboot presubmission #542

Closed s3alfisc closed 2 years ago

s3alfisc commented 2 years ago

Submitting Author Name: Alexander Fischer Submitting Author Github Handle: !--author1-->@s3alfisc<!--end-author1-- Other Package Authors Github handles: @droodman, Repository: https://github.com/s3alfisc/fwildclusterboot Submission type: Pre-submission Language: en


Description: Implementation of the fast algorithm for wild cluster bootstrap 
             inference developed in Roodman et al (2019, STATA Journal) for 
             linear regression models <doi:10.1177/1536867X19830877>, 
             which makes it feasible to quickly calculate bootstrap test 
             statistics based on a large number of bootstrap draws even for 
             large samples. Multiway clustering, regression weights, 
             bootstrap weights, fixed effects and subcluster bootstrapping
             are supported. Further, both restricted (WCR) and unrestricted
             (WCU) bootstrap are supported. Methods are provided for a variety 
             of fitted models, including 'lm()', 'feols()' 
             (from package 'fixest') and 'felm()' (from package 'lfe'). 
             Additionally implements a heteroskedasticity-robust (HC1) wild 
             bootstrap.
             Further, the package provides an R binding to 'WildBootTests.jl',
             which provides additional speed gains and functionality, 
             including the 'WRE' bootstrap for instrumental variable models 
             (based on models of type 'ivreg()' from package 'ivreg')
             and hypotheses with q > 1.

Scope

fwildclusterboot conducts inference for (linear) regression models via a wild (cluster) bootstrap. It further serves as an R binding of the WildBootTests.jl library.

Yes, I have worked with the srr package and have a draft available (but it is currently not in the main branch).

The target audience is academic social scientists (economics, political science, sociology). fwildclusterboot should be used whenever regression errors are "clustered" into few groups, in which case inference based on asymptotic approximations might fail.

Other R packages that implement the wild cluster bootstrap are sandwich via its vcovBS function and the clusterSEs package. fwildclusterboot implements a significantly faster algorithm. Furter, fwildclusterboot offers additional functionality, e.g. the subcluster bootstrap. Through WildBootTests.jl, it also allows to run a highly optimized version of the WRE bootstrap for IV regressions (Davidson & MacKinnon, 2010) , which is not available in any other R package.

fwildclusterboot implements the "fast" wild cluster bootstrap in R, but also allows to call WildBootTests.jl via the JuliaConnectoR package. It's therefore (also) a wrapper package, and you might consider it to be out of scope?

emilyriederer commented 2 years ago

Hi @s3alfisc - thanks so much for submitting your package. I especially appreciate all of the details (and impressive benchmarking results!) in the best-in-class answer.

As a general matter, this seems in-scope for a regression package. Based on your work with the statistical standards, could you please comment on whether you believe that the package is on track to meet at least half of the general + category-specific standards?

Thanks also for the call-out on the optional functionality to call the Julia implementation. We are also planning, but do not yet have standards, for a statistical wrapper package. A member of the statistics peer review team may comment further on whether or not this package could fit that category also.

s3alfisc commented 2 years ago

Hi @emilyriederer , thanks for your feedback! I have uploaded my comments based on the statistical software roclets in a separate branch here .

mpadge commented 2 years ago

@s3alfisc @emilyriederer I've had a look through the code, and do not think this package should really be considered a statistical "wrapper" package, as it only constucts a single external call to one Julia package. The Julia connection is entirely optional for package functionality, and in terms of code and algorithms represents only a very small portion of the code. I suggest the review process can proceed under the single category nominated above.

@s3alfisc I note that your current version documents compliance with 59 / 115 standards, which is > 50%, so okay to proceed. The srr_report() nevertheless notes standard G2.15 appears to be missing. Could you please check and rectify if possible? Note also that our automated check system currently works on GitHub default branch only, but we're happy to use your submission to develop an appropriate workflow for non-default branches. Until then, the checks might have to be generated for default branch, after which I'll manually remove that comment and re-generate them for your "ropensci" branch. Thanks!

s3alfisc commented 2 years ago

Hi @mpadge , thanks for your feedback! I will spend some time cleaning up the package over the next days (documenting all srr roclets, add G.2.15, and merge everything into the main branch) and then I will submit fwildclusterboot! =)

mpadge commented 2 years ago

@s3alfisc No need to merge if you'd rather not. We do want our system to one day work on non-default branches, so as said are happy to use your submission to test that, if that's easier for you. That said, we do generally advise against this, because then you'll be stuck implementing changes to reviews in your non-default branch, which may make your own workflow less robust. Up to you.

emilyriederer commented 2 years ago

Thanks @s3alfisc and @mpadge for the conversation. It sounds like we reached a great resolution on where the package fits and look forward to the full submission. I'll close this presubmission inquiry in the meantime.