rsaggio87 / LeaveOutTwoWay

Bias corrected estimates of variance components in two fixed effects models as described in Kline, Saggio and Sølvsten (2020)
24 stars 11 forks source link

Version without CMG routine? #29

Closed LukasBFreund closed 2 years ago

LukasBFreund commented 2 years ago

First of all, thanks so much for making these codes available -- they are a fantastic resource for the community!

I am working on an implementation in a computing environment that imposes constraints on the installation of software not part of 'default' installations -- an issue that is probably quite common when working with administrative datasets. In particular, in the case of the Matlab implementation, I cannot install Mingw-w64 or another compiler that would allow installing the CMG routines which the package invokes. (Other constraints apply to the Julia implementation or the executable.)

Would it perhaps be possible to have a version of the Matlab package that does not rely on the CMG routine or other non-native functions?

Once again, thanks a lot for this resource!

Best, Lukas

rsaggio87 commented 2 years ago

Thanks @LukasBFreund. I created a branch that runs KSS without the need to invoke CMG. Take a look here: https://github.com/rsaggio87/LeaveOutTwoWay/tree/noCMG

LukasBFreund commented 2 years ago

Fantastic, thanks a lot, @rsaggio87!

LukasBFreund commented 2 years ago

Hi @rsaggio87,

When running the new code on the servers of the statistical authorities that provide the administrative data -- but, (to me) confusingly enough, not on my private device-- I run into the following issue:

Error using sparse
Sparse matrix sizes must be nonnegative integer scalars.
Error in leave_out_KSS (line 494)
    X_pe=[X(:,1:N) sparse(NT,J)];
Error in testing_CMG (line 20)
[sigma2_psi,sigma_psi_alpha,sigma2_alpha] = leave_out_KSS(y,id,firmid,[],[],type_of_algorithm);

Digging in a little bit, this appears to be related to J=N-size(X,2) evaluating as a negative integer (-1683). Presumably this is related to the adjustment of the code in the construction of the design matrices, where the 'no-CMG' version has:

NT=size(y,1);
    D=sparse(1:NT,id',1);
    F=sparse(1:NT,firmid',1);
    S=speye(J-1);
    S=[S;sparse(-zeros(1,J-1))];  %N+JxN+J-1 restriction matrix 
    X=[D,F*S];
    N=size(D,2);
    J=N-size(X,2);

By contrast, the 'standard' version has:

 NT=size(y,1);
    D=sparse(1:NT,id',1);
    F=sparse(1:NT,firmid',1);
    X=[D,-F]; %shaped in a pure Laplacian format.
    N=size(D,2);
    J=size(F,2);

As per error message, this issue also arises when using the testing data/program provided.

Apologies for re-opening the issue; I wanted to leave the error description here for completeness at least.

Best, Lukas

rsaggio87 commented 2 years ago

ah, that's really strange. Just to make sure I understand: the code runs in your computer (as well as in mine FWIW) but not on their server?

I made an update to the branch, can you see if it runs now?

LukasBFreund commented 2 years ago

You're right, it's strange indeed! (Also, their system is likewise a 64-bit Windows computer and I tried alternative versions of Matlab, notably R2019b and R2021b, on both personal computer and their servers.)

As an update:

(a) Exact:

Error using leverages (line 52)
Incorrect dimensions for matrix multiplication. Check that the number of columns in the first matrix matches the number of rows in the second
matrix. To perform elementwise multiplication, use '.*'.

Error in leave_out_KSS (line 513)
    [Pii, Mii, correction_JLA, Bii_fe, Bii_cov, Bii_pe]=leverages(X_fe,X_pe,X,xx,Lchol,type_algorithm,simulations_JLA);

(b) JLA:

Error using pcg (line 72)
Right hand side must be a column vector of length 29705 to match the coefficient matrix.

Error in leverages (line 91)
        parfor i=1:scale

Error in leave_out_KSS (line 513)
    [Pii, Mii, correction_JLA, Bii_fe, Bii_cov, Bii_pe]=leverages(X_fe,X_pe,X,xx,Lchol,type_algorithm,simulations_JLA);

(The parallel coding makes it a bit harder to dig deeper into the messages.)

I'm reporting these issues for completeness in case others come across similar problems. But if the original version works for everyone else, it's probably not worth additional hassle for you (though I am, of course, happy to try updates).

Best, Lukas

rsaggio87 commented 2 years ago

ah, I think I see the issue, should be able to give you a fix soon!

rsaggio87 commented 2 years ago

Hi @LukasBFreund!

Ok, I think I found the issue. Just pushed a new version of the branch. Make sure to download it (the function leverages also needs to be updated). I've also posted a master file "testing_CMG" that tests the code using both JLA and exact.

My previous checks were off because of a conflict problem (I was basically accidentally running the previous version of the code!) Once I've realized that I was able to get the same errors as you got and fixed them in the current version. Hopefully this time everything would make sense :)

Keep me posted! Raffa

LukasBFreund commented 2 years ago

Hi @rsaggio87,

Fantastic, thanks a lot: happy to confirm that "testing_CMG" runs on both my personal computer and the servers (even without parallel computing toolbox installed, though ofc correspondingly slower), where it also works with data from a random sample of workers.

Really appreciate all your help with this, Raffa!

Best, Lukas

rsaggio87 commented 2 years ago

good to hear! Have fun with it!