ycroissant / plm

Panel Data Econometrics with R
GNU General Public License v2.0
49 stars 13 forks source link

adding colums to pdata.frame #22

Open ggrothendieck opened 2 years ago

ggrothendieck commented 2 years ago

Suggest adding transform, with and cbind methods. Currently none of these work:

library(plm)
E <- pdata.frame(EmplUK)

transform(E, lag_emp = lag(emp)) # does not use lag.pseries; returns data.frame, not pdata.frame
transform(E, lag_emp = lag(E$emp))  # returns data.frame, not pdata.frame.  Seemingly redundant E$ must be used

with(E, cbind(E, lag_emp = lag(emp)) ) # does not use lag.pseries; returns data.frame, not pdata.frame
with(E, cbind(E, lag_emp = lag(E$emp))) # returns data.frame, not pdata.frame.  Was not able to make use of  with.

This one does work but it is not the best style with R and does not fit in with pipes. E2 had to be written 3 times to avoid overwriting E.

E2 <- E
E2$lag_emp <- lag(E2$emp)
SebKrantz commented 2 years ago

FYI @ggrothendieck: https://sebkrantz.github.io/collapse/reference/indexing.html. Works inside all data masking environments and can be efficiently transformed to regular pdata.frame for estimation using the to_plm() function.

tappek commented 2 years ago

Some background/explanation why things like with do not work as expected with a pdata.frame: This is due to a pdata.frame not holding pseries in columns but the plain base class. A pseries is created only when columns are extracted (cf. the first vignettte "While extracting a series from a pdata.frame, a pseries is created, which is the original series with the index attribute.").

Yes, it is somewhat unintuitive.

Here is an example:

library(plm)
df <- data.frame(id = c(1,1,2), time = c(1,2,1), f = factor(c("a", "a", "b")), n = c(1:3))
pdf <- pdata.frame(df)

class(pdf$f)
#> [1] "pseries" "factor"
lapply(pdf, class)
#> $id
#> [1] "factor"
#> 
#> $time
#> [1] "factor"
#> 
#> $f
#> [1] "factor"
#> 
#> $n
#> [1] "integer"