py-econometrics / pyfixest

Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax
https://py-econometrics.github.io/pyfixest/pyfixest.html
MIT License
118 stars 27 forks source link

Add more intro material quick start #436

Closed juanitorduz closed 1 month ago

juanitorduz commented 1 month ago

Related to https://github.com/s3alfisc/pyfixest/issues/435

juanitorduz commented 1 month ago

@s3alfisc I recommend you enable the https://www.reviewnb.com/ app (it is free for public repos!) to review notebooks :)

juanitorduz commented 1 month ago

@s3alfisc I started working on https://github.com/s3alfisc/pyfixest/issues/435. What do you think about the intro section? I want to do something similar for the rest of the sections. Hence, I want some initial feedback :)

For such purpose, I need to re-run all the notebook (also important for all users). However, I can not access C:/Users/alexa/Downloads/census2000_5pc.dta ;) Is there a way to fetch this from other source or maybe even add it into the repository (If the corresponding license allows it). I think is important for users to be able to run the quickstart end-to-end.

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 83.00%. Comparing base (35b47e9) to head (6ecc8de). Report is 81 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #436 +/- ## ========================================== - Coverage 85.79% 83.00% -2.79% ========================================== Files 38 51 +13 Lines 3421 4243 +822 ========================================== + Hits 2935 3522 +587 - Misses 486 721 +235 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

s3alfisc commented 1 month ago

I've activated the reviewnb tool. Very nice! Trivago is even mentioned in the list of customers, I should ask procurement if we actually have a professional license =)

Beyond that I think the intro is excellent. Clearly describes the problem and I also like the sketch of the solution by demeaning! Thanks so much! =)

The reason why you're results don't match exactly is that you forgot to demean the dependent variable:

fit_demeaned = pf.feols(
    fml="Y_demeaned ~ X1_demeaned",
    data=data.assign(
        Y_demeaned=lambda df: df["Y"] - df.groupby("f1")["Y"].transform("mean")
        X1_demeaned=lambda df: df["X1"] - df.groupby("f1")["X1"].transform("mean")
    ),
    vcov="HC1",
)

fit_demeaned.summary()

The non-existing file can be downloaded from http://www.damianclarke.net/stata/census2000_5pc.dta but

import pandas as pd 
df = pd.read_stata("www.damianclarke.net/stata/census2000_5pc.dta")

fails for some reasons. But the file is super big, so maybe we should add a warning about that?

juanitorduz commented 1 month ago

@s3alfisc pd.read_stata("http://www.damianclarke.net/stata/census2000_5pc.dta") seems to work :)

review-notebook-app[bot] commented 1 month ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

juanitorduz commented 1 month ago

@s3alfisc I think this one is ready for a first review round :)

s3alfisc commented 1 month ago

Thanks - I think it looks really good but I still left a lot of comments πŸ˜„

juanitorduz commented 1 month ago

Thanks - I think it looks really good but I still left a lot of comments πŸ˜„

Hey @s3alfisc I can not see your comments πŸ˜… . Can you please send the again? πŸ™

s3alfisc commented 1 month ago

Oh - I left the comments in the review notebook. Can you see them there?

Basically, I wonder about how to simply and intuitively define what a fixed effect is but think you have done a pretty good job, actually; and then I also wonder if it would help to rename the columns in pd.get_data() or to create a new synthetic data set so that the column names are aligned with the story in quickstart? E.g. we could think about creating a standard DiD data set with unit and time fixed effects, or a dataset for a standard AKM-regression with worker and firm fixed effects. Or maybe you have a better example from the tech world (e.g. maybe a cluster randomized trial?)? Column names of "f1" and "f2" are probably good enough for my internal testing but likely not too pedagogical πŸ˜„

juanitorduz commented 1 month ago

@s3alfisc ok! Got it! It would be nice to see notebook comments. Could you try again? Sometimes one need a to hit the β€œfinish review” button πŸ˜„

s3alfisc commented 1 month ago

Looks like all my comments are gone =/ Will start a second round then!

juanitorduz commented 1 month ago

Ok I started with the feedback in https://github.com/s3alfisc/pyfixest/commit/eadcbf892160111010b44a6e8642ca220d690322 . I will continue in the next days so that we don't delay this PR for long :D

juanitorduz commented 1 month ago

ok @s3alfisc I think we are ready for another review round :)

s3alfisc commented 1 month ago

This looks excellent @juanitorduz! I found two more small typos. Beyond that, I think this PR is ready to be merged! I really appreciate the added section on multiple estimations & the section on "brave and true". I think the intro has improved a lot. Vielen Dank =)

juanitorduz commented 1 month ago

This looks excellent @juanitorduz! I found two more small typos. Beyond that, I think this PR is ready to be merged! I really appreciate the added section on multiple estimations & the section on "brave and true". I think the intro has improved a lot. Vielen Dank =)

Fixed!

s3alfisc commented 1 month ago

Ah wow you have been faster then me =D I will merge it after the CI run. Thank you!

juanitorduz commented 1 month ago

@s3alfisc thank you for the comments and feedback!

BTW: You can ad reviews directly in the ReviewNB notebook view as well ;

juanitorduz commented 1 month ago

Click on the ReviewNB and you will see the comment on the cell ;)

s3alfisc commented 1 month ago

I now realize what I did wrong the other day - I added a lot of comments but the review was only "pending", I never clicked on "post to github". That's why all my initial comments got lost 🌞