Closed s3alfisc closed 2 years ago
Note that the bug must stem from somewhere in get_scores()
, as that is the only part where the 1x and 3x bootstraps differ.
Perhaps we should use rpy2
to have both run within one script?
if you have a chance commit everything to a new branch and I can try to test it out
Okay I opened a new branch and modified your test so it would run with rpy2
and get results from one script. #34
What's weird is that I get different results for all bootstrap types. Are you using a newer version of fwildclusterboot
for your tests? Can confirm that this is the same output from R if I run utils.R
:
Python
WCR11 WCR31 WCU11 WCU31
0 -6.037710e-07 5.000298e-16 -0.585775 0.055639
1 7.388861e-01 7.388864e-01 0.444430 0.748944
2 1.200195e+00 1.200195e+00 0.831350 1.256170
3 5.907652e+00 5.907644e+00 17.606098 5.235413
4 -5.907652e+00 -5.907644e+00 -17.606098 -5.235413
5 -1.200195e+00 -1.200195e+00 -0.831350 -1.256170
6 -7.388861e-01 -7.388864e-01 -0.444430 -0.748944
7 6.037710e-07 -5.000298e-16 0.585775 -0.055639
R
WCR11 WCR31 WCU11 WCU31
0 -0.581297 0.000000 -0.620100 0.017975
1 0.776840 1.170635 0.733529 1.117294
2 0.468139 0.711277 0.464507 0.775609
3 11.001060 4.564926 11.050285 4.597024
4 -11.001060 -4.564926 -11.050285 -4.597024
5 -0.468139 -0.711277 -0.464507 -0.775609
6 -0.776840 -1.170635 -0.733529 -1.117294
7 0.581297 0.000000 0.620100 -0.017975
Thanks for doing this - I will take a closer look today but unfortunately not with full focus, as I will also have to fix a CRAN error for fwildclusterboot
, which does not seem to pass a rather esoteric CRAN check ... I am using the most recent version of fwildclusterboot, 0.12.
Ok - I took another look: the WCR11 is completely off, the WCU11 is only slightly off, and the WCR31 and WCR33 both work as they should. So it likely is more than just one error :D
I found the bug for the WCR11 - a simply algebraic error:
I calculate the restricted least squares estimate $\tilde{\beta}$ as
beta_tilde = self.beta_hat - self.tXXinv @ self.R / A * (self.R @ self.beta_hat - 0)
while it should have been
beta_tilde = self.beta_hat - self.tXXinv @ self.R * A * (self.R @ self.beta_hat - 0).
The differences in the WRU11 between R and Python are very small - and I still have not found the root source. PR with a fix of the above will follow later today!
This is great news! Are you able run the test with rpy2?
On Sat, Oct 22, 2022, 10:28 AM Alexander Fischer @.***> wrote:
I found the bug for the WCR11 - a simply algebraic error:
I calculate the restricted least squares estimate $\tilde{\beta}$ as
beta_tilde = self.beta_hat - self.tXXinv @ self.R / A * (self.R @ self.beta_hat - 0)
while it should have been
beta_tilde = self.beta_hat - self.tXXinv @ self.R A (self.R @ self.beta_hat - 0).
The differences in the WRU11 between R and Python are very small - and I still have not found the root source. PR with a fix of the above will follow later today!
— Reply to this email directly, view it on GitHub https://github.com/s3alfisc/wildboottest/issues/33#issuecomment-1287810448, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQVPBHJCVJR3GIJ5HNE4CDWEP2ZDANCNFSM6AAAAAARJPGTYM . You are receiving this because you commented.Message ID: @.***>
Rpy2 - did not try it yet, unfortunately - my Python seems to be older than 3.7, which seems to be the minimum requirement for rpy2 now. I haven't yet upgraded because I was somewhat afraid to completely break my setup 😅
Wow I need to upgrade urgently - 3.7 is from 2018!
These are the results I receive now:
>>> print(df.sort_values(by=list(df.columns),axis=0).head())
WCR11 WCR31 WCU11 WCU31
25 -5.949815 -5.982923 -5.910237 -5.788861
24 -2.415818 -3.148949 -2.417741 -3.175197
29 -2.275351 -1.627375 -2.275375 -1.601628
17 -1.401527 -1.754945 -1.409451 -1.768863
28 -1.313436 -1.169452 -1.315881 -1.170040
>>> print("\n")
>>> print("R")
R
>>> print(r_df.sort_values(by=list(r_df.columns),axis=0).head())
WCR11 WCR31 WCU11 WCU31
25 -5.949815 -5.982923 -5.913542 -5.788861
24 -2.415818 -3.148949 -2.418203 -3.175197
29 -2.275351 -1.627375 -2.275787 -1.601628
17 -1.401527 -1.754945 -1.408123 -1.768863
28 -1.313436 -1.169452 -1.316142 -1.170040
Perfect matches (as it should be) except for the WCU11. Here, differences are minor.
The only logic in the code that is distinct for the WCU11 is the following:
elif self.bootstrap_type in ["WCU1x"]:
beta_hat = self.tXXinv @ self.tXy
self.beta_hat = beta_hat
beta = beta_hat
M = self.tXgXg_list
# compute scores based on tXgyg, M, beta
scores_list = []
if(self.bootstrap_type in ["WCR1x", "WCU1x"]):
for ix, g in enumerate(self.bootclustid):
scores_list.append(self.tXgyg_list[ix] - M[ix] @ beta)
In fwildclusterboot
, this section is computed in the get_scores function:
scores <-
switch(
bootstrap_type,
WCR1x = function(g) tXgyg[[g]] - tXgXg[[g]] %*% beta_tilde,
WCU1x = function(g) tXgyg[[g]] - tXgXg[[g]] %*% beta_hat, #Xg'x u_g
WCR3x = function(g) tXgyg[[g]] - tXgX1g[[g]] %*% beta_1g_tilde[[g]],
WCU3x = function(g) tXgyg[[g]] - tXgXg[[g]] %*% beta_g_hat[[g]],
)
I can also rule out that the code never enters the relevant sections, as otherwise the bootstrapped t-stats for the WCU11 should match those of one of the other variants or throw an explicit error.
Wow thank you for the work on this, this is looking great. I'll focus on getting tests done for statsmodels and the package then and we should be all set for PR. The only thing we might need is to update documentation later.
On Sun, Oct 23, 2022, 6:55 AM Alexander Fischer @.***> wrote:
These are the results I receive now:
print(df.sort_values(by=list(df.columns),axis=0).head()) WCR11 WCR31 WCU11 WCU31 25 -5.949815 -5.982923 -5.910237 -5.788861 24 -2.415818 -3.148949 -2.417741 -3.175197 29 -2.275351 -1.627375 -2.275375 -1.601628 17 -1.401527 -1.754945 -1.409451 -1.768863 28 -1.313436 -1.169452 -1.315881 -1.170040 print("\n")
print("R") R print(r_df.sort_values(by=list(r_df.columns),axis=0).head()) WCR11 WCR31 WCU11 WCU31 25 -5.949815 -5.982923 -5.913542 -5.788861 24 -2.415818 -3.148949 -2.418203 -3.175197 29 -2.275351 -1.627375 -2.275787 -1.601628 17 -1.401527 -1.754945 -1.408123 -1.768863 28 -1.313436 -1.169452 -1.316142 -1.170040
Perfect matches (as it should be) except for the WCU11. Here, differences are minor.
The only logic in the code https://github.com/s3alfisc/wildboottest/blob/c6cefe4e1523b7c6b22078784ec5114a4184e9f6/wildboottest/wildboottest.py#L184 that is distinct for the WCU11 is the following:
elif self.bootstrap_type in ["WCU1x"]: beta_hat = self.tXXinv @ self.tXy self.beta_hat = beta_hat beta = beta_hat M = self.tXgXg_list # compute scores based on tXgyg, M, beta scores_list = [] if(self.bootstrap_type in ["WCR1x", "WCU1x"]): for ix, g in enumerate(self.bootclustid): scores_list.append(self.tXgyg_list[ix] - M[ix] @ beta)
In fwildclusterboot, this section is computed in the get_scores https://github.com/s3alfisc/fwildclusterboot/blob/master/R/get_scores.R function:
scores <- switch( bootstrap_type, WCR1x = function(g) tXgyg[[g]] - tXgXg[[g]] %% beta_tilde, WCU1x = function(g) tXgyg[[g]] - tXgXg[[g]] %% beta_hat, #Xg'x u_g WCR3x = function(g) tXgyg[[g]] - tXgX1g[[g]] %% beta_1g_tilde[[g]], WCU3x = function(g) tXgyg[[g]] - tXgXg[[g]] %% beta_g_hat[[g]], )
I can also rule out that the code never enters the relevant sections, as otherwise the bootstrapped t-stats for the WCU11 should match those of one of the other variants or throw an explicit error.
— Reply to this email directly, view it on GitHub https://github.com/s3alfisc/wildboottest/issues/33#issuecomment-1288085662, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQVPBE7UP34MRESSRUIXBLWEUKSRANCNFSM6AAAAAARJPGTYM . You are receiving this because you commented.Message ID: @.***>
Sounds great! I hope I'll find the WCU11 bug soon, then will move over to work on documentation.
Maybe we can aim for the statsmodels PR for the beginning of next week?
That's a tall order, but I'll try to get my side done.
On Sun, Oct 23, 2022, 2:38 PM Alexander Fischer @.***> wrote:
Maybe we can aim for the statsmodels PR for the beginning of next week?
— Reply to this email directly, view it on GitHub https://github.com/s3alfisc/wildboottest/issues/33#issuecomment-1288172770, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQVPBHZMQ57IU6VO5MB4GTWEWA3BANCNFSM6AAAAAARJPGTYM . You are receiving this because you commented.Message ID: @.***>
Yup, I have now realized that as well. I'll have little time to work on this this week. Let's take it step by step and fix the documentation, WCU11 first, then upload to Pypy, then worry about the PR :)
Looks like there is still some issue with the WCR3x:
Output exceeds the [size limit](command:workbench.action.openSettings?%5B%22notebook.output.textLineLimit%22%5D). Open the full output data[ in a text editor](command:workbench.action.openLargeOutput?4a241ad8-8feb-42ca-8b24-660f02412057)
Python
WCR11 WCU11 WCR31 WCU31 WCR13 WCU13 \
12 -3.074674 -4.109023e+00 -3.068128 -4.062617 -3.046323 -4.083582e+00
4 -2.186620 -1.045002e+00 -2.193078 -1.047134 -2.175083 -1.043310e+00
8 -2.016250 -9.715024e-01 -2.014649 -0.979364 -2.008475 -9.727401e-01
0 -1.536622 2.375314e-16 -1.541331 -0.011370 -1.531542 2.367461e-16
13 -1.156497 -2.573257e+00 -1.153869 -2.560607 -1.146171 -2.558409e+00
WCR33 WCU33
12 -3.039777 -4.036587
4 -2.183138 -1.045208
8 -2.007460 -0.980384
0 -1.537230 -0.011330
13 -1.143577 -2.545343
R
WCR11 WCU11 WCR31 WCU31 WCR13 WCU13 \
12 -3.074674 -4.109023e+00 -3.069685 -4.062617 -3.046323 -4.083582e+00
4 -2.186620 -1.045002e+00 -2.192887 -1.047134 -2.175083 -1.043310e+00
8 -2.016250 -9.715024e-01 -2.015517 -0.979364 -2.008475 -9.727401e-01
0 -1.536622 1.715504e-16 -1.541329 -0.011370 -1.531542 2.126437e-16
13 -1.156497 -2.573257e+00 -1.154120 -2.560607 -1.146171 -2.558409e+00
WCR33 WCU33
...
4 -2.182915 -1.045208
8 -2.008418 -0.980384
0 -1.537228 -0.011330
13 -1.143833 -2.545343
Fixed with #b46634245f61091b67a1a4ac0c45535895319088. Quite the error ...
Now all looks great:
Output exceeds the [size limit](command:workbench.action.openSettings?%5B%22notebook.output.textLineLimit%22%5D). Open the full output data[ in a text editor](command:workbench.action.openLargeOutput?462ee3e5-f868-4272-bb3f-80b5f393428a)
Python
WCR11 WCU11 WCR31 WCU31 WCR13 WCU13 WCR33 \
27 -2.375626 -2.732371 -2.385534 -2.741262 -2.363997 -2.723406 -2.373574
25 -2.127324 -2.170970 -2.142531 -2.188740 -2.118369 -2.163555 -2.133678
31 -1.833894 -2.266291 -1.829709 -2.264162 -1.828216 -2.259557 -1.823290
29 -1.653609 -1.828740 -1.654730 -1.836620 -1.648412 -1.822022 -1.649087
19 -1.442773 -1.463975 -1.441460 -1.458873 -1.429982 -1.451783 -1.427949
WCU33
27 -2.731877
25 -2.181104
31 -2.257368
29 -1.829889
19 -1.446593
R
WCR11 WCU11 WCR31 WCU31 WCR13 WCU13 WCR33 \
27 -2.375626 -2.732371 -2.385534 -2.741262 -2.363997 -2.723406 -2.373574
25 -2.127324 -2.170970 -2.142531 -2.188740 -2.118369 -2.163555 -2.133678
31 -1.833894 -2.266291 -1.829709 -2.264162 -1.828216 -2.259557 -1.823290
29 -1.653609 -1.828740 -1.654730 -1.836620 -1.648412 -1.822022 -1.649087
19 -1.442773 -1.463975 -1.441460 -1.458873 -1.429982 -1.451783 -1.427949
WCU33
...
25 -2.181104
31 -2.257368
29 -1.829889
19 -1.446593
I think we can push the the PypI release now!
In a first test of Python vs R, it looks like the '31' bootstrap t-statistics match exactly under full enumeration (as they should), while the '11' do not:
See here
I suppose that the R version is correct, as it's tested against WildBootTests.jl and produces matching statistics.