Add benchmarks against `linearmodels` and `fastreg`

py-econometrics / pyfixest

Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax

https://py-econometrics.github.io/pyfixest/

MIT License

172 stars 34 forks source link

Add benchmarks against `linearmodels` and `fastreg` #558

Closed s3alfisc closed 3 weeks ago

s3alfisc commented 3 months ago

Context

It would be great to add benchmarks against the following two python packages:

@apoorvalal has benchmarks against fastreg here, showing equal performance to pyfixest.

To Do

Adjust the pyfixest benchmarks by including benchmarks for linearmodels and fastreg. For linearmodels, add benchmarks for OLS, for fastreg for OLS and Poisson (linearmodels does not support Poisson afaik).
Update the benchmark figures by adjusting and running the visualize_benchmarks.ipynb notebook.

s3alfisc commented 3 months ago

@rafimikail would you be interested in picking this up?

rafimikail commented 3 months ago

Hi @s3alfisc, so this one is basically adding another two lines (linearmodels and fastreg) in our performance benchmarking line plots right?

s3alfisc commented 3 months ago

Yes, exactly! Maybe best to start with one of the two packages and divide this into two PRs? Is it ok if I assign you @rafimikail?

rafimikail commented 3 months ago

Certainly @s3alfisc , you can allocate this to me 👍

rafimikail commented 3 months ago

Hi @s3alfisc , wanted to confirm, to run_benchmarks.ipynb, i think i need to retrieve some data first that will be used in the notebook, do i need to run data_generation.r first before running the notebook or i could just get it from https://github.com/lrberge/fixest/tree/master/_BENCHMARK?

Thanks!

s3alfisc commented 3 months ago

Oh I completely overlooked this - you would have to run the data generation r script first. I can also do so quickly and send you the data as a csv?

rafimikail commented 3 months ago

Hey @s3alfisc , i tried to run the data generation r file but experiencing an error, need to find out why

But if you have the data/csv already, that would be helpful

Thanks

s3alfisc commented 3 months ago

Will send it in a moment :)

marcandre259 commented 4 weeks ago

I have been looking into running the benchmark with linearmodels. It's PanelOLS function, which does the efficient treatment of fixed-effects, only fits into the benchmark scenario with 2 fixed effects (dum1 + dum2).

It turns out its PanelOLS function supports at most 2 fixed effects (reference).
Another issue is that the provided indices must be unique, so you cannot have a single fixed effect with values 1, 1, 2, 2 for example.

s3alfisc commented 4 weeks ago

Hi @marcandre259 , super cool that you're looking at this! As far as I understand it, linearmodels has an AbsorbingOLS function that runs pyhdfe under the hood, which should allow for multiple fixed effects and non-panel data.