tlamadon / pytwoway

Two way models in python
MIT License
24 stars 7 forks source link

CRE variance decomposition #44

Open klintmane1 opened 1 year ago

klintmane1 commented 1 year ago

How to get var(eps), var(alpha), var(psi) before and after the correction using CRE? Similar to the FE estimator. Is it possible?

adamoppenheimer commented 1 year ago

var_tot will give the corrected var(psi) and cov_tot will give the corrected cov(psi, alpha) (see the notebook to see how to access these).

Unfortunately I don't know much about the CRE estimator, as Professor Lamadon wrote the code, so I can't answer anything beyond that.

I would recommend taking a look at the paper to see how it's derived and understand how it's implemented.

I would also recommend reading through the code to see what it's doing (and how it compares to the derivations).

Sorry I couldn't be more helpful about this. I eventually plan to look through the code so that I can rewrite it to be more aligned with the style of the rest of the package, to add weights, and to potentially add the option for control variables, but that likely won't be until next summer.

santiagohermo commented 1 year ago

Hi @adamoppenheimer. I wanted to let you know that I was trying out the CRE estimation following the CRE example on non-simulated data and I got this error:

image

The error seems to originate around here. The note there says that, if the code crashes in next line, it's probably because the data are not collapsed. This seems right, since when I added collapsing the code stopped crashing! However, the example doesn't collapse the data, yet with the simulated data everything works. In summary, do you know why the example works despite this problem? Should the example collapse the data?

adamoppenheimer commented 1 year ago

Hi @santiagohermo, in the example notebook the data is collapsed.

Setting the option 'collapse_at_connectedness_measure': True in the cleaning parameters collapses the data (so long as the connectedness measure is set to 'leave_out_something'). My guess is that in your code you aren't specifying the connectedness measure which is why you need to manually collapse the data (please let me know if that is not the case).

I made this an option since if you are computing the leave-out connected set on collapsed data, you need to make sure to collapse the data before you compute the leave-out set. I think the option is helpful so that people won't accidentally do it the wrong way - before I figured out the right order, there were many times where I was confused about why the HE correction wasn't working even though the data was supposedly leave-one-out connected (obviously it wasn't, since I cleaned it incorrectly).

So this option makes sure to first clean the data (without computing the leave-one-out set), then collapses it and computes the leave-one-out set on the collapsed data. If I remember correctly, it also makes sure to be efficient about how it cleans the data before and after collapsing, so it should be faster than cleaning, collapsing, then computing the leave-one-out set manually.

Please let me know if this answers your question.

Also, if you think it would be helpful for me to update the documentation about this, please open a new issue about it. I can't get to it now since I have school, but I'll get to it once I have time.

santiagohermo commented 1 year ago

Thanks for the quick reply @adamoppenheimer! No need to do anything here, I was just curious to understand this better :)

In my code I tried specifying the connectedness measure to be 'leave_out_something', but for some reason right after cleaning my python kernel was restarting automatically. Then I set connectedness to connected and the cleaning step worked, which after collapsing allowed me to run the estimator. I'll try a few things to see if I can make it work with a leave-out connectedness measure.

Good luck in school!

adamoppenheimer commented 1 year ago

@santiagohermo sorry this has taken so long, but I realized the issue you were having isn't even relevant, you can see #40. This is actually a mistake in the documentation, for the CRE estimator you shouldn't need to compute the leave-one-out set, you don't even need the connected set. This is because you are using the firm classes rather than firm ids to estimate the model.