ycroissant / plm

Panel Data Econometrics with R
GNU General Public License v2.0
49 stars 13 forks source link

Non-standard names in formula evaluation #37

Open sondalex opened 1 year ago

sondalex commented 1 year ago

Hi,

I have noticed backticks do not work in plm formula. I think it would be useful for people who would want to create two stage models without transforming the data twice.

Example:

library(plm)
data("Grunfeld", package="plm")
model1 = plm(inv ~ value + capital + factor(year),
              data = Grunfeld, model = "pooling")
model1
# model results
# ...

plm(inv ~ value + capital + `factor(year)`, data=model.frame(model1), model="within", effect="individual")
# Error in eval(predvars, data, env) : object 'factor(year)' not found
tappek commented 1 year ago

I will need to look at that more closely. A quick check of ?formula gives this: Variable names can be quoted by backticks `like this` in formulae, although there is no guarantee that all code using formulae will accept such non-syntactic names.

-> I read this as backticks in formulae can cause issues already in only base R setups.

However, in your example why would you like to estimate a two-way FE model as a one-way model and a dummy for the other dimenson (and not simply setting effect = "twoways")?

sondalex commented 1 year ago

The example is just to illustrate the idea. My use case would have been to long to explain.

ycroissant commented 1 year ago

Hi Kevin,

The code works for me with my installed (2.6-1) version of plm.

Best

Yves

--

library(plm) data("Grunfeld", package="plm")model1 = plm(inv ~ value + capital + factor(year), data = Grunfeld, model = "pooling")

Le mar. 22 nov. 2022 à 20:43, Kevin Tappe @.***> a écrit :

I will need to look at that more closely.

However, in your example why would you like to estimate a two-way FE model as a one-way model and a dummy for the other dimenson (and not simply setting effect = "twoways")?

— Reply to this email directly, view it on GitHub https://github.com/ycroissant/plm/issues/37#issuecomment-1323962779, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE2LAVJLGCJUEGQ3GCVCFU3WJTZ35ANCNFSM6AAAAAASH45X7Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- http://www.univ-reunion.fr http://ufr-de.univ-reunion.fr/

Yves Croissant

Professeur de Sciences Economiques

UFR de Droit et d'Économie

Université de La Réunion

Interne : 2338 | Externe : +262 262 93 84 46 .

tappek commented 1 year ago

It does not run with 2.6-1 on my end, also not with 1.7-0 (i.e., in the non-current pFormula times).

By the time plm gets to access the data, the non-syntactially valid name factor(year) has been converted to syntactially valid name factor.year., thus it is not found anymore. See for the term syntactically valid name ?make.names and run make.names("factor(year)").

The issue is not with backticks per se but with the paranthesis in factor(year) making it a syntactically not valid name. This is illustrated by the following backtick example:

library(plm)
data("Grunfeld", package="plm")
model1 <- plm(inv ~ value + capital + factor(year),
             data = Grunfeld, model = "pooling")
data2 <- model.frame(model1)
data2[ , "a"] <- rnorm(200)
form <- inv ~ value + capital + `a`
plm(form, data=data2, model="within", effect="individual") # works
form2 <- inv ~ value + capital + `factor(year)`
plm(form2, data=data2, model="within", effect="individual") # errors

It works with lm tough. Not sure if this is worth the effort for plm to make it work, also due to the general warning in ?formula.

A workaround would be to ensure syntactically valid names und use these in the formula, so something along these lines:

colnames(data2) <- make.names(colnames(data2), unique = TRUE)
plm(inv ~ value + capital + factor.year., data=data2, model="within", effect="individual")
sondalex commented 1 year ago

Thank you for your digging into this issue

m0byn commented 1 year ago

Although there exists a workaround I came across this issue and have to say it is rather suprising! I am using age groups as variables, so it is rather intuitive to include numers in column names. Since it works with the lm function I do argue the value of digging deeper into this issue is worth the effort!

santoshbs commented 1 year ago

I am having the same issue. My dependent variable starts with "z_". plm() keeps saying object not found.

tappek commented 1 year ago

Do you have a reproducible example for your z_ case? The following z_ case works:

library(plm)
data(Grunfeld)
Grunfeld$`z_a` <- Grunfeld$inv
plm(z_a ~ value + capital, data = Grunfeld)

Model Formula: z_a ~ value + capital

Coefficients:
  value capital 
0.11012 0.31007 
santoshbs commented 1 year ago

Thank you, @tappek.

I am not sure how to create a reproducible example. I will try.

Just FYI - while the same dataset and variable names worked with lm() and lmer(), plm() kept showing object not found. For some reason, colnames(df_pdataframe) and head(df_pdataframe) kept showing different column names. Anyways, I had to go back to lmer().

tappek commented 1 year ago

Spontaneously, I cannot come up with a reason why colnames and head would show different column names for a pdata.frame as we do not provide specialised methods for pdata.frames in the package and there is nothing special for column names in a pdata.frame. Here a reproducible example would help as well to identify a possible cause.

https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example gives some hints how to create one.