py-econometrics / wildboottest

python module for wild cluster bootstrapping
https://py-econometrics.github.io/wildboottest/
MIT License
7 stars 0 forks source link

Error in bootstrap algo for '11' types? #33

Closed s3alfisc closed 2 years ago

s3alfisc commented 2 years ago

In a first test of Python vs R, it looks like the '31' bootstrap t-statistics match exactly under full enumeration (as they should), while the '11' do not:

See here

I suppose that the R version is correct, as it's tested against WildBootTests.jl and produces matching statistics.

>>> df (Python)
          WCR11         WCR31     WCU11     WCU31
0 -5.569437e-07  7.119511e-16 -0.547569  0.079819
1 -3.286834e+00 -3.286832e+00 -4.243206 -3.085644
2  1.715372e-02  1.715392e-02 -0.166832  0.053646
3 -2.881170e+00 -2.881171e+00 -1.780821 -3.045997
4  2.881170e+00  2.881171e+00  1.780821  3.045997
5 -1.715372e-02 -1.715392e-02  0.166832 -0.053646
6  3.286834e+00  3.286832e+00  4.243206  3.085644
7  5.569437e-07 -7.119511e-16  0.547569 -0.079819
>>> r_df (R)
   Unnamed: 0     WCR11         WCR31     WCU11     WCU31
0           1 -0.620824 -1.017073e-16 -0.547213  0.079819
1           2 -3.852912 -3.286832e+00 -4.198954 -3.085644
2           3 -0.196266  1.715392e-02 -0.171808  0.053646
3           4 -1.630894 -2.881171e+00 -1.782077 -3.045997
4           5  1.630894  2.881171e+00  1.782077  3.045997
5           6  0.196266 -1.715392e-02  0.171808 -0.053646
6           7  3.852912  3.286832e+00  4.198954  3.085644
7           8  0.620824  1.017073e-16  0.547213 -0.079819
s3alfisc commented 2 years ago

Note that the bug must stem from somewhere in get_scores(), as that is the only part where the 1x and 3x bootstraps differ.

amichuda commented 2 years ago

Perhaps we should use rpy2 to have both run within one script?

amichuda commented 2 years ago

if you have a chance commit everything to a new branch and I can try to test it out

amichuda commented 2 years ago

Okay I opened a new branch and modified your test so it would run with rpy2 and get results from one script. #34

What's weird is that I get different results for all bootstrap types. Are you using a newer version of fwildclusterboot for your tests? Can confirm that this is the same output from R if I run utils.R:

Python
          WCR11         WCR31      WCU11     WCU31
0 -6.037710e-07  5.000298e-16  -0.585775  0.055639
1  7.388861e-01  7.388864e-01   0.444430  0.748944
2  1.200195e+00  1.200195e+00   0.831350  1.256170
3  5.907652e+00  5.907644e+00  17.606098  5.235413
4 -5.907652e+00 -5.907644e+00 -17.606098 -5.235413
5 -1.200195e+00 -1.200195e+00  -0.831350 -1.256170
6 -7.388861e-01 -7.388864e-01  -0.444430 -0.748944
7  6.037710e-07 -5.000298e-16   0.585775 -0.055639

R
       WCR11     WCR31      WCU11     WCU31
0  -0.581297  0.000000  -0.620100  0.017975
1   0.776840  1.170635   0.733529  1.117294
2   0.468139  0.711277   0.464507  0.775609
3  11.001060  4.564926  11.050285  4.597024
4 -11.001060 -4.564926 -11.050285 -4.597024
5  -0.468139 -0.711277  -0.464507 -0.775609
6  -0.776840 -1.170635  -0.733529 -1.117294
7   0.581297  0.000000   0.620100 -0.017975
s3alfisc commented 2 years ago

Thanks for doing this - I will take a closer look today but unfortunately not with full focus, as I will also have to fix a CRAN error for fwildclusterboot, which does not seem to pass a rather esoteric CRAN check ... I am using the most recent version of fwildclusterboot, 0.12.

s3alfisc commented 2 years ago

Ok - I took another look: the WCR11 is completely off, the WCU11 is only slightly off, and the WCR31 and WCR33 both work as they should. So it likely is more than just one error :D

s3alfisc commented 2 years ago

I found the bug for the WCR11 - a simply algebraic error:

I calculate the restricted least squares estimate $\tilde{\beta}$ as

beta_tilde = self.beta_hat - self.tXXinv @ self.R / A * (self.R @ self.beta_hat - 0)

while it should have been

beta_tilde = self.beta_hat - self.tXXinv @ self.R * A * (self.R @ self.beta_hat - 0).

The differences in the WRU11 between R and Python are very small - and I still have not found the root source. PR with a fix of the above will follow later today!

amichuda commented 2 years ago

This is great news! Are you able run the test with rpy2?

On Sat, Oct 22, 2022, 10:28 AM Alexander Fischer @.***> wrote:

I found the bug for the WCR11 - a simply algebraic error:

I calculate the restricted least squares estimate $\tilde{\beta}$ as

beta_tilde = self.beta_hat - self.tXXinv @ self.R / A * (self.R @ self.beta_hat - 0)

while it should have been

beta_tilde = self.beta_hat - self.tXXinv @ self.R A (self.R @ self.beta_hat - 0).

The differences in the WRU11 between R and Python are very small - and I still have not found the root source. PR with a fix of the above will follow later today!

— Reply to this email directly, view it on GitHub https://github.com/s3alfisc/wildboottest/issues/33#issuecomment-1287810448, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQVPBHJCVJR3GIJ5HNE4CDWEP2ZDANCNFSM6AAAAAARJPGTYM . You are receiving this because you commented.Message ID: @.***>

s3alfisc commented 2 years ago

Rpy2 - did not try it yet, unfortunately - my Python seems to be older than 3.7, which seems to be the minimum requirement for rpy2 now. I haven't yet upgraded because I was somewhat afraid to completely break my setup 😅

s3alfisc commented 2 years ago

Wow I need to upgrade urgently - 3.7 is from 2018!

s3alfisc commented 2 years ago

These are the results I receive now:

>>>   print(df.sort_values(by=list(df.columns),axis=0).head())
       WCR11     WCR31     WCU11     WCU31
25 -5.949815 -5.982923 -5.910237 -5.788861
24 -2.415818 -3.148949 -2.417741 -3.175197
29 -2.275351 -1.627375 -2.275375 -1.601628
17 -1.401527 -1.754945 -1.409451 -1.768863
28 -1.313436 -1.169452 -1.315881 -1.170040
>>>   print("\n")

>>>   print("R")
R
>>>   print(r_df.sort_values(by=list(r_df.columns),axis=0).head())  
       WCR11     WCR31     WCU11     WCU31
25 -5.949815 -5.982923 -5.913542 -5.788861
24 -2.415818 -3.148949 -2.418203 -3.175197
29 -2.275351 -1.627375 -2.275787 -1.601628
17 -1.401527 -1.754945 -1.408123 -1.768863
28 -1.313436 -1.169452 -1.316142 -1.170040

Perfect matches (as it should be) except for the WCU11. Here, differences are minor.

The only logic in the code that is distinct for the WCU11 is the following:

      elif self.bootstrap_type in ["WCU1x"]: 

        beta_hat = self.tXXinv @ self.tXy
        self.beta_hat = beta_hat
        beta = beta_hat 
        M = self.tXgXg_list

      # compute scores based on tXgyg, M, beta
      scores_list = []

      if(self.bootstrap_type in ["WCR1x", "WCU1x"]):

        for ix, g in enumerate(self.bootclustid):

          scores_list.append(self.tXgyg_list[ix] - M[ix] @ beta)

In fwildclusterboot, this section is computed in the get_scores function:

  scores <- 
    switch(
      bootstrap_type, 
      WCR1x = function(g) tXgyg[[g]] - tXgXg[[g]] %*% beta_tilde, 
      WCU1x = function(g) tXgyg[[g]] - tXgXg[[g]] %*% beta_hat, #Xg'x u_g
      WCR3x = function(g) tXgyg[[g]] - tXgX1g[[g]] %*% beta_1g_tilde[[g]], 
      WCU3x = function(g) tXgyg[[g]] - tXgXg[[g]] %*% beta_g_hat[[g]],
    )

I can also rule out that the code never enters the relevant sections, as otherwise the bootstrapped t-stats for the WCU11 should match those of one of the other variants or throw an explicit error.

amichuda commented 2 years ago

Wow thank you for the work on this, this is looking great. I'll focus on getting tests done for statsmodels and the package then and we should be all set for PR. The only thing we might need is to update documentation later.

On Sun, Oct 23, 2022, 6:55 AM Alexander Fischer @.***> wrote:

These are the results I receive now:

print(df.sort_values(by=list(df.columns),axis=0).head()) WCR11 WCR31 WCU11 WCU31 25 -5.949815 -5.982923 -5.910237 -5.788861 24 -2.415818 -3.148949 -2.417741 -3.175197 29 -2.275351 -1.627375 -2.275375 -1.601628 17 -1.401527 -1.754945 -1.409451 -1.768863 28 -1.313436 -1.169452 -1.315881 -1.170040 print("\n")

print("R") R print(r_df.sort_values(by=list(r_df.columns),axis=0).head()) WCR11 WCR31 WCU11 WCU31 25 -5.949815 -5.982923 -5.913542 -5.788861 24 -2.415818 -3.148949 -2.418203 -3.175197 29 -2.275351 -1.627375 -2.275787 -1.601628 17 -1.401527 -1.754945 -1.408123 -1.768863 28 -1.313436 -1.169452 -1.316142 -1.170040

Perfect matches (as it should be) except for the WCU11. Here, differences are minor.

The only logic in the code https://github.com/s3alfisc/wildboottest/blob/c6cefe4e1523b7c6b22078784ec5114a4184e9f6/wildboottest/wildboottest.py#L184 that is distinct for the WCU11 is the following:

  elif self.bootstrap_type in ["WCU1x"]:

    beta_hat = self.tXXinv @ self.tXy
    self.beta_hat = beta_hat
    beta = beta_hat
    M = self.tXgXg_list

  # compute scores based on tXgyg, M, beta
  scores_list = []

  if(self.bootstrap_type in ["WCR1x", "WCU1x"]):

    for ix, g in enumerate(self.bootclustid):

      scores_list.append(self.tXgyg_list[ix] - M[ix] @ beta)

In fwildclusterboot, this section is computed in the get_scores https://github.com/s3alfisc/fwildclusterboot/blob/master/R/get_scores.R function:

scores <- switch( bootstrap_type, WCR1x = function(g) tXgyg[[g]] - tXgXg[[g]] %% beta_tilde, WCU1x = function(g) tXgyg[[g]] - tXgXg[[g]] %% beta_hat, #Xg'x u_g WCR3x = function(g) tXgyg[[g]] - tXgX1g[[g]] %% beta_1g_tilde[[g]], WCU3x = function(g) tXgyg[[g]] - tXgXg[[g]] %% beta_g_hat[[g]], )

I can also rule out that the code never enters the relevant sections, as otherwise the bootstrapped t-stats for the WCU11 should match those of one of the other variants or throw an explicit error.

— Reply to this email directly, view it on GitHub https://github.com/s3alfisc/wildboottest/issues/33#issuecomment-1288085662, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQVPBE7UP34MRESSRUIXBLWEUKSRANCNFSM6AAAAAARJPGTYM . You are receiving this because you commented.Message ID: @.***>

s3alfisc commented 2 years ago

Sounds great! I hope I'll find the WCU11 bug soon, then will move over to work on documentation.

s3alfisc commented 2 years ago

Maybe we can aim for the statsmodels PR for the beginning of next week?

amichuda commented 2 years ago

That's a tall order, but I'll try to get my side done.

On Sun, Oct 23, 2022, 2:38 PM Alexander Fischer @.***> wrote:

Maybe we can aim for the statsmodels PR for the beginning of next week?

— Reply to this email directly, view it on GitHub https://github.com/s3alfisc/wildboottest/issues/33#issuecomment-1288172770, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQVPBHZMQ57IU6VO5MB4GTWEWA3BANCNFSM6AAAAAARJPGTYM . You are receiving this because you commented.Message ID: @.***>

s3alfisc commented 2 years ago

Yup, I have now realized that as well. I'll have little time to work on this this week. Let's take it step by step and fix the documentation, WCU11 first, then upload to Pypy, then worry about the PR :)

s3alfisc commented 2 years ago

Looks like there is still some issue with the WCR3x:

Output exceeds the [size limit](command:workbench.action.openSettings?%5B%22notebook.output.textLineLimit%22%5D). Open the full output data[ in a text editor](command:workbench.action.openLargeOutput?4a241ad8-8feb-42ca-8b24-660f02412057)
Python
       WCR11         WCU11     WCR31     WCU31     WCR13         WCU13  \
12 -3.074674 -4.109023e+00 -3.068128 -4.062617 -3.046323 -4.083582e+00   
4  -2.186620 -1.045002e+00 -2.193078 -1.047134 -2.175083 -1.043310e+00   
8  -2.016250 -9.715024e-01 -2.014649 -0.979364 -2.008475 -9.727401e-01   
0  -1.536622  2.375314e-16 -1.541331 -0.011370 -1.531542  2.367461e-16   
13 -1.156497 -2.573257e+00 -1.153869 -2.560607 -1.146171 -2.558409e+00   

       WCR33     WCU33  
12 -3.039777 -4.036587  
4  -2.183138 -1.045208  
8  -2.007460 -0.980384  
0  -1.537230 -0.011330  
13 -1.143577 -2.545343  

R
       WCR11         WCU11     WCR31     WCU31     WCR13         WCU13  \
12 -3.074674 -4.109023e+00 -3.069685 -4.062617 -3.046323 -4.083582e+00   
4  -2.186620 -1.045002e+00 -2.192887 -1.047134 -2.175083 -1.043310e+00   
8  -2.016250 -9.715024e-01 -2.015517 -0.979364 -2.008475 -9.727401e-01   
0  -1.536622  1.715504e-16 -1.541329 -0.011370 -1.531542  2.126437e-16   
13 -1.156497 -2.573257e+00 -1.154120 -2.560607 -1.146171 -2.558409e+00   

       WCR33     WCU33  
...
4  -2.182915 -1.045208  
8  -2.008418 -0.980384  
0  -1.537228 -0.011330  
13 -1.143833 -2.545343
s3alfisc commented 2 years ago

Fixed with #b46634245f61091b67a1a4ac0c45535895319088. Quite the error ...

Now all looks great:

Output exceeds the [size limit](command:workbench.action.openSettings?%5B%22notebook.output.textLineLimit%22%5D). Open the full output data[ in a text editor](command:workbench.action.openLargeOutput?462ee3e5-f868-4272-bb3f-80b5f393428a)
Python
       WCR11     WCU11     WCR31     WCU31     WCR13     WCU13     WCR33  \
27 -2.375626 -2.732371 -2.385534 -2.741262 -2.363997 -2.723406 -2.373574   
25 -2.127324 -2.170970 -2.142531 -2.188740 -2.118369 -2.163555 -2.133678   
31 -1.833894 -2.266291 -1.829709 -2.264162 -1.828216 -2.259557 -1.823290   
29 -1.653609 -1.828740 -1.654730 -1.836620 -1.648412 -1.822022 -1.649087   
19 -1.442773 -1.463975 -1.441460 -1.458873 -1.429982 -1.451783 -1.427949   

       WCU33  
27 -2.731877  
25 -2.181104  
31 -2.257368  
29 -1.829889  
19 -1.446593  

R
       WCR11     WCU11     WCR31     WCU31     WCR13     WCU13     WCR33  \
27 -2.375626 -2.732371 -2.385534 -2.741262 -2.363997 -2.723406 -2.373574   
25 -2.127324 -2.170970 -2.142531 -2.188740 -2.118369 -2.163555 -2.133678   
31 -1.833894 -2.266291 -1.829709 -2.264162 -1.828216 -2.259557 -1.823290   
29 -1.653609 -1.828740 -1.654730 -1.836620 -1.648412 -1.822022 -1.649087   
19 -1.442773 -1.463975 -1.441460 -1.458873 -1.429982 -1.451783 -1.427949   

       WCU33  
...
25 -2.181104  
31 -2.257368  
29 -1.829889  
19 -1.446593

I think we can push the the PypI release now!