jacob-long / panelr

Regression models and utilities for repeated measures and panel data
Other
98 stars 21 forks source link

Explore `plm` interoperability #9

Open jacob-long opened 5 years ago

jacob-long commented 5 years ago

I'm not looking to remake plm or anything like that, but I'd like to see if it's possible to provide a method for converting panel_data objects into plm's data format. There might also be some tools in plm that I don't need to remake but would be useful to import into this package.

briatte commented 5 years ago

I was going to suggest the same thing.

briatte commented 5 years ago

I'm happy to see a package that tries to bring sanity to working with panel data with R.

I come from Stata, where panel data is pretty straightforward, and where the user can quickly replicate papers by using just a few commands like xtreg, xtpcse or prais, followed by margins to get marginal effects or predicted probabilities. R, in comparison, is quite a mess (but a rich one!), with concurrent implementations of

  1. fixed-effects-with-clustered SEs in plm, estimatr (lm_robust function), panelAR, prais, geepack, pcse and more,
  2. plus quite a few packages to deal with marginal effects, such as effects, ggeffects or margins (which mimicks Stata as much as possible).

Most of the packages cited above are cited in Achim Zeileis' CRAN Task View for Econometrics, in the "Panel data models" (which mentions another panel data implementation, Paneldata, but plm seems a more important one to support) and "Microeconometrics" sections.

I'd really love to see some package that would simplify the package landscape for the user, while at the same time making it more obvious how to replicate Stata code/results with R (which seemed to have been part of the rationale behind the pcse package, for instance).

Please let me know how I might be able to help.

For background, I worked on a S4 class for panel data some years ago, but the package was more of an experiment than anything else. I also contributed a few things to broom, which is part of the tidymodels package suite, where panelr might well find a place one day, don't you think?

Inviting @strengejacke and @alexpghayes to the conversation, just in case.

P.S. I'm leaving out the correspondence that exists between panel-data-models-as-understood-by-econometricians (with its horrendous FE/RE terminology) and mixed-effects-models, as understood by everyone else, but it exists and helps understanding pooling and shrinkage, i.e. what is really going on under the hood: see this brilliant recent blog post by @m-clark.

strengejacke commented 5 years ago

My impression is that panelr is a kind of "wolf in sheep's clothing", as it comes around looking like a "fixed-effects"-regression modelling package, while it's actually a mixed effects model that can incorporate random slopes, time-invariant covariates or the grouping factor as random effect. ;-) - this was my impression at least for the wbm()-function, which I have quickly tested (and compared to complex REWB-models as suggested by Bell et al. 2018).

Not sure if these kinds of models are available out-of-the-box in Stata, though.

Anyway, the package-syntax and API looks clean and straightforward (and maybe it helps econometricians fitting less flawed models ;-) and I'll check how to add "interoperability" (support) with sjPlot and ggeffects, and probably also some easystats-packages.

briatte commented 5 years ago

Thanks @strengejacke, and thanks for mentioning easystats – I did not know you had your own tidymodels-like initiative running!

jacob-long commented 5 years ago

I'm happy to see a package that tries to bring sanity to working with panel data with R.

I come from Stata, where panel data is pretty straightforward, and where the user can quickly replicate papers by using just a few commands like xtreg, xtpcse or prais, followed by margins to get marginal effects or predicted probabilities.

I should mention, then, that part of my motivation/inspiration for panelr was the way the xt suite in Stata makes panel data a first-class citizen, so to speak. I haven't taken xt too literally as a model, but in spirit I would like for there to be a relatively good and consistent interface for panel data in R.

R, in comparison, is quite a mess (but a rich one!), with concurrent implementations of

1. fixed-effects-with-clustered SEs in `plm`, `estimatr` (`lm_robust` function), `panelAR`, `prais`, `geepack`, `pcse` and more,

Seeing pcse reminds me that I should do something to clarify terms. When I think "panel," I think large N, small T. Of course, the panel_data data structure doesn't care about that and it should be just as good for small N, large T. To be frank, though, I'm not sufficiently experienced with TSCS data to know how appropriate some of these models are for small N, large T. I know that people tend not to use fixed effects models with TSCS data but I'm not sure if that's more of a disciplinary norm versus a statistically principled practice. I'm sure the political scientists have hashed these things out but I haven't read deeply on it.

2. plus quite a few packages to deal with marginal effects, such as `effects`, `ggeffects` or `margins` (which mimicks Stata as much as possible).

And just to add to the fun, I maintain two packages which partly duplicate this functionality. jtools includes effect_plot() that plots predicted values a la plot(effects::effect()). interactions estimates/plots simple slopes, which is (basically) psychologist-speak for marginal effects at representative values for linear models. I rely on the margins package to produce what I have been calling "simple margins" for GLMs that don't mean-center covariates that are not involved in the interactions.

Most of the packages cited above are cited in Achim Zeileis' CRAN Task View for Econometrics, in the "Panel data models" (which mentions another panel data implementation, Paneldata, but plm seems a more important one to support) and "Microeconometrics" sections.

Yes, Paneldata looks to me like abandonware.

I'd really love to see some package that would simplify the package landscape for the user, while at the same time making it more obvious how to replicate Stata code/results with R (which seemed to have been part of the rationale behind the pcse package, for instance).

Good idea. This was something I had been thinking about early on but got away from it over time as I completed my collaboration with a Stata user.

Please let me know how I might be able to help.

For background, I worked on a S4 class for panel data some years ago, but the package was more of an experiment than anything else. I also contributed a few things to broom, which is part of the tidymodels package suite, where panelr might well find a place one day, don't you think?

Inviting @strengejacke and @alexpghayes to the conversation, just in case.

P.S. I'm leaving out the correspondence that exists between panel-data-models-as-understood-by-econometricians (with its horrendous FE/RE terminology) and mixed-effects-models, as understood by everyone else, but it exists and helps understanding pooling and shrinkage, i.e. what is really going on under the hood: see this brilliant recent blog post by @m-clark.

My impression is that panelr is a kind of "wolf in sheep's clothing", as it comes around looking like a "fixed-effects"-regression modelling package, while it's actually a mixed effects model that can incorporate random slopes, time-invariant covariates or the grouping factor as random effect. ;-) - this was my impression at least for the wbm()-function, which I have quickly tested (and compared to complex REWB-models as suggested by Bell et al. 2018).

You could say that 😃 . Of course I think the lesson we learn from such models is that fixed effects models were just random effects models all along, which were themselves multilevel models.

Not sure if these kinds of models are available out-of-the-box in Stata, though.

Anyway, the package-syntax and API looks clean and straightforward (and maybe it helps econometricians fitting less flawed models ;-) and I'll check how to add "interoperability" (support) with sjPlot and ggeffects, and probably also some easystats-packages.

ghost commented 4 years ago

Coming back to the initial topic:

The issue of a good panel data format is a thing in R. Python has pandas and it is (I believe) very sophisticated for panel data manipulation functions.

plm's pdata.frame is very rudimentary (and strange at times, e.g. the index variables become factors) and only offers few data manipulation functions (it got a bit more over the last ~2 years). It does the trick for package plm and for basic stuff, but not beyond. I would not build a package on top of pdata.frame, but a converter function from/to pdata.frame would be nice for interaction of packages via the data.

panelr has panel_data which offers a bit. Not sure how it evolves.

tsibble is relatively new but has more than one developer and could be promising. Interesingly, they do not even mention the word "panel" on their homepage. They are not comming from the econometrics side of things, I guess.

data.table offers quite a few functions suitable for panels, as does dplyr - if one knows how to apply these in the panel context. (Maybe tsibble uses dplyr in that vein under the hood, but I did not check).

Update: ... and there is pmdplyr as an extension to dplyr for panel data and aims to output a pibble (https://nickch-k.github.io/pmdplyr/index.html) - did not look at it yet.

Update 2: Not sure what the (panel estimation) packages lfe, fixest, and alpaca use but I believe it is more or less a direct direct estimation on the data with specification of the individual and time dimension and no data manipulation functions (other than those specified via the model formula). But I could be wrong. lfe can work with pdata.frames though.