Clarifying varying slopes

marcdelabarrera commented 4 weeks ago

I am trying to understand the varying slopes options from feols. My understanding is that these two approaches should yield the same results:

feols(Sales~Region + MktExpenses + Region:MktExpenses, data=data)

feols(Sales~Region+MktExpenses+Region[MktExpenses], data=data)

However, is not the case (the first regression recovers the coefficients, the second does not). What is Region[MktExpenses] specifically doing? Also fitted.values do not coincide either, so it does not seem to be a problem of interpreting the coefficients.

For the dummy data I'm using:

library(fixest)
set.seed(123)

# Number of observations
n <- 10000
n_regions<-3
# Generate fake data
data <- data.frame(
  Sales = NA,
  MktExpenses = rnorm(n, mean = 50, sd = 10),
  Region = sample(LETTERS[1:n_regions], n, replace = TRUE)
)
fe<-setNames(rnorm(n_regions, mean = 50, sd = 10), LETTERS[1:n_regions])
# Calculate sales
data$Sales <- data$MktExpenses+fe[data$Region]+(data$Region=='A')*data$MktExpenses+rnorm(n, mean = 0, sd = 2)

So the slope for MktExpenses in region A should be 2 and for the other regions should be 1.

grantmcdermott commented 2 weeks ago

Several things:

Your model is misspecified. The varying slopes syntax should only appear in the fixed-effects slot, i.e. after the |.
You are effectively saying that the varying the slopes syntax (| Region[MktExpenses]) is the same as a full regular interaction with both parent terms (Region * MktExpenses <=> Region + MktExpenses + Region:MktExpenses). This is not correct. Varying slopes is equivalent to a nested interaction where there is only one parent term (Region / MktExpenses <=> Region + Region:MktExpenses).

Here's proof using your demo dataset:

library(fixest)
set.seed(123)

# Number of observations
n <- 10000
n_regions<-3
# Generate fake data
data <- data.frame(
  Sales = NA,
  MktExpenses = rnorm(n, mean = 50, sd = 10),
  Region = sample(LETTERS[1:n_regions], n, replace = TRUE)
)
fe<-setNames(rnorm(n_regions, mean = 50, sd = 10), LETTERS[1:n_regions])
# Calculate sales
data$Sales <- data$MktExpenses+fe[data$Region]+(data$Region=='A')*data$MktExpenses+rnorm(n, mean = 0, sd = 2)

## these three models are all equivalent
m1 = feols(Sales ~ Region + Region:MktExpenses, data = data)
m2 = feols(Sales ~ Region/MktExpenses, data = data)
m3 = feols(Sales ~ 1 | Region[MktExpenses], data = data)

coef(m2)[4:6]
#> RegionA:MktExpenses RegionB:MktExpenses RegionC:MktExpenses 
#>           2.0005741           0.9978691           1.0029734
fixef(m3)[[2]]
#>         A         B         C 
#> 2.0005741 0.9978691 1.0029734

^{Created on 2024-06-14 with reprex v2.1.0}

The "interaction terms" section of the introductory vignette explains this all in some detail, so I would recommend taking a look at it next: https://lrberge.github.io/fixest/articles/fixest_walkthrough.html#interaction-terms

lrberge commented 1 week ago

Thanks a lot Grant for your very detailed answer!

lrberge / fixest

Clarifying varying slopes #506