Re-implement unit tests with Stata?

etiennebacher commented 2 years ago

As you mentioned in #24 and in your blog post, you dropped tests comparing your package's outputs to Stata's outputs, the reason being that you don't have a Stata license anymore and that it can't run on CI.

I think that testing against Stata's output is quite essential for many people (and I would also be reassured by the equivalence between your package and Stata's implementation). Therefore, I'm wondering whether you could create a CSV file that contains a list of outputs created on Stata, and then compare this list to the outputs of your package. This list would have to be updated every time Stata's boottest is updated but that shouldn't happen too frequently. For example, fixest does something like that for benchmarking feols against reghdfe so maybe it's something that could be done for testing.

I don't think I'm fluent enough with Stata to create such a list of outputs, but if you have some code and need someone to run it for you, just let me know

s3alfisc commented 2 years ago

Hi @etiennebacher , I am super sorry that I did not respond earlier - I somehow missed the email github usually sends out when an issue is opened! It's particularly strange because I actually check all my repos basically every day...

I have dropped the tests against Stata, but have replaced them with extensive tests against WildBootTests.jl, which is written by @droodman, who is also the author of boottest. Overall, I think that WildBootTests.jl and boottest are of comparable quality, even though WildBootTests.jl is a much younger implementation.

In one devel-branch, I have added some notes on how fwildclusterboot::boottest() is tested. The most important tests for validity are

test-r-vs-julia.R which tests fwildclusterboot against WildBootTests.jl (stochastically)
[test-tstat-equivalence.R](), which checks that the non-bootstrapped t-stats computed via boottest() match those computed by fixest::feols() / fixest::tstat().

Overall, I think you are right that adding a simple test suite against Stata-boottest might help convince some users that it is safe to use fwildclusterboot. Though I am somewhat reluctant to pay 800 Euros for a non-academic Stata license to simply run a few tests 😃

So it would be super awesome if you could run some Stata code for me! If you are indeed up for doing so, I could sketch some Stata code and tests against boottest for you to run - those would potentially look like the tests I have implemented for wildrwolf.

All the best, Alex

s3alfisc commented 2 years ago

I have opened a new branch, in which I have included the 'old' tests of fwildclusterboot against boottest. Note that a range of more recent functionality might not be tested, e.g. hypotheses involving more than one parameter and all functionality introduced via WildBootTests.jl (the WRE bootstrap for IV and tests of multiple joint hypotheses). You can find all tests here. Also, the tests used to be powered by the tinytest package, but I have moved over to testthat (though I think I updated the code so that it runs smoothlessly with testthat).

etiennebacher commented 2 years ago

Hello Alex, thanks for your answer, no worries for the delay, I'm not at all in a rush :smile: Of course I don't expect you to pay a Stata license to run a tests suite. I'm still ok to run the tests for Stata (while I still have a license), I will take a look at the branch with the "old" tests to see how it goes.

Concerning the workflow, since running these tests in a CI environment is not possible, I suppose the best way to operate is that you tag me before each CRAN release so that I can run the tests and see if everything works. Of course the problem with this approach is that you can't test directly if your code changes break the tests with Stata, but I don't see another option for now.

s3alfisc commented 2 years ago

Hi @etiennebacher, I thought about this a little more, and I think it would in fact be best if you or I (in case I find access to a Stata licence) would run the wild cluster bootstrap via fwildclusterboot and Stata and simply copy the parameters produced by Stata into the R test file, e.g.

library(fwildclusterboot)
data(voters)
lm_fit <- lm(proposition_vote ~ treatment + ideology1 + log_income + Q1_immigration,
  data = voters
)
boot <- boottest(lm_fit,
  B = 9999,
  param = "treatment",
  clustid = "group_id1"
)

expect_equal(boot$p_val, 'pvalue taken from Stata', tolerance = 1e-05)
expect_equal(boot$t_stat, 'tstat taken from Stata')

boottest.stata should be stable enough so that this should be fine (incidentally, the same should apply to WildBootTests.jl, and I will implement the same testing scheme for WildBootTests.jl eventually, as I cannot run Julia tests on Cran).

I don't think I have a contributing.md file (I should really add it), but in case you would implement these Stata tests, I'd be super grateful and would be more than happy to add you as a package contributor :)

etiennebacher commented 2 years ago

Hi @s3alfisc, good idea, it's just gonna some time to clean the test file to comment out the Stata code and copy the results from Stata, but I'll do it eventually. FYI, I ran the test file (with all the Stata code run through RStata) and so far all the tests pass, so well done!

s3alfisc / fwildclusterboot

Re-implement unit tests with Stata? #47