Add errors/CI to predict.fixest?

lrberge / fixest

Fixed-effects estimations

https://lrberge.github.io/fixest/

377 stars 59 forks source link

Add errors/CI to predict.fixest? #22

Open pbaylis opened 4 years ago

pbaylis commented 4 years ago

Hi, this package is great. I'm wondering if it would be possible to add standard errors/confidence intervals to the output of predict.fixest? I realize that might be quite a bit of work, though.

Related issue in lfe repo. This would also add to fixest's incorporation into the margins package, per this issue.

lrberge commented 4 years ago

Hi thanks for the kind words :-) and the suggestion!

The problem with adding confidence intervals for each observation is that they depend on the structure of the variance-covariance matrix. In feols the fixed-effects are projected out, the real estimation happening only on the reduced model. To have valid confidence intervals would require the complete matrix of variance covariance, therefore including the fixed-effects and.... this may prove just too substantial to obtain.

But in the next release (0.5.0, forthcoming) I improved substantially (I mean orders of magnitude) the speed of the algorithm when the fixed-effects are included in the model as regular variables. So confidence intervals in this case are valid.

What I will do is to include se.fit (and also confidence intervals) to predict when models don't contain explicit fixed-effects, and return NA for these measures when FEs are present.

Below the code to obtain the SEs and an illustration of the problem.

base = iris
names(base) = c("y", "x1", "x2", "x3", "species")
se.fit.fixest = function(x){
    mat_est = model.matrix(x)
    XRinv = mat_est %*% solve(chol(crossprod(mat_est)))
    ip <- drop(XRinv^2 %*% rep(sum(x$residuals**2) / (x$nobs - x$nparams), ncol(mat_est)))
    sqrt(ip)
}

a = lm(y ~ x1 + species, base)
head(predict(a, se.fit = TRUE)$se.fit)
# => 0.06240784 0.07686065 0.06651280 0.07108218 0.06458060 0.07972048

# Without fixed-effects: Fine
b = feols(y ~ x1 + species, base)
head(se.fit.fixest(b))
# => 0.06240784 0.07686065 0.06651280 0.07108218 0.06458060 0.07972048

# With fixed-effects: Not fine
b = feols(y ~ x1 | species, base)
head(se.fit.fixest(b))
# => 0.04052941 0.03473950 0.03705546 0.03589748 0.04168740 0.04516134

pbaylis commented 4 years ago

Thanks, that (the order of magnitude improvement in models w/ FE explicitly included) is great news! You are right of course regarding needing the full VCOV to perform a proper prediction.

The reason I asked is because work in my subfield (climate impacts) often estimate a "response function," which is basically a plot of the difference in predicted outcomes for a given value of the covariates of interest (e.g., temperature and squared temperature) versus some baseline (e.g., 20 C), holding other covariates constant. In that case, one wouldn't need the full VCOV, just the one for the covariates of interest. Here's an example of a plot (with made up data) that shows multiple response functions with different functional forms.

resp-fxns-2density

Having written all that out, that's probably fairly niche relative to most user requirements and fairly easy to accomplish using survey::svycontrast anyway (which is how I made the plot above). So I guess this is all a long way of saying "nevermind" :).

Thanks again for the great package.

lrberge commented 4 years ago

Cmon, I'm sure it's not so niche!

I'm not really familiar with response functions and I don't see directly how they're constructed. Could you send a link to a reference for me to see clearer? Maybe I'll implement something, but it'll be in the mid/long run (I'm trying to take a coding break for the moment).

Btw for a "nevermind" message, at least it was one with a nice graph! ;-)

pbaylis commented 4 years ago

I'm probably over-complicating them. Suppose you're estimating a quadratic model with unit fixed effects, like this:

A response function f(X) is just the predicted value of y across a range of Xs (say, 0 to 40) relative to the predicted value of y at a chosen baseline x (say, x = 20). That is, for each X we have:

As you can see, the fixed effects drop out, so for this linear-in-parameters model at least we don't need the entire VCOV to generate standard errors.

This is super common in the climate-impacts literature. For many examples, this is Figure 3 in Carleton and Hsiang (2016).

Carleton and Hsiang (2016), Figure 3

Hope this was a little useful at least. I'm very happy to pitch in if you do end up implementing this at some point - just let me know.

lrberge commented 4 years ago

Thanks for your very clear explanation.

Looks interesting and it might end up in the package indeed, but, to give you a timeline, not before 4/6 months. Likely I'll come back to you at that time.

In any case, I'm always glad to learn something new: thanks!

pbaylis commented 3 years ago

I took a shot at cooking this up today, actually inspired by some code @kendonB put together a couple years ago. Fun fact: that link goes to an SO post I made almost six years ago on this exact topic, so it's high time I nail this down.

(Edit:) Also, I am not convinced that this actually belongs in the fixest package, but since this issue is open, I'd like to contribute to closing the loop one way or another.

The following code is pretty close to what I would like. The basic idea is to estimate E[Xb] and Var(Xb) for any provided X (i.e., newdata), where X is the a subset of variables included in the main part of the formula (not the fixed effects). At the risk of revealing that exactly how rusty my metrics/coding knowledge is, there are two remaining issues that I haven't figured out how to resolve and would appreciate input on:

How to construct X from a fitted object when only some of the variables in the object are in newdata? model.matrix(object, newdata) errors out in this case. I think I should be able to do this with some formula parsing magic but haven't been able to work that out.
What's the proper way to obtain the residual degrees of freedom for a fitted feols model? We need this to compute the confidence interval on Xb, and for lm and felm can get it from object$df.residual, but not for feols. Right now I'm using fitstat(object, "f.df2"), but I don't think that is correct for every situation. I read over the standard error treatise but couldn't work it out from that.

library(fixest)

predict_partial <- function(object, newdata, level = 0.95) {
    # Extract terms object, removing response variable, factor variables, and intercept 
    tt <- terms(object)
    Terms <- delete.response(tt)
    attr(Terms, "intercept") <- 0

    # Construct X using remaining formula
    # TODO: How to remove variables in formula (+ functions of, interactions) that are not in newdata?
    X <- model.matrix(Terms, data = newdata)
    B <- coef(object)
    sig <- vcov(object)

    if (class(object) == "fixest") {
        # TODO: Calculate degrees of freedom correctly for fixest. I don't think this is quite right...
        df <- as.numeric(fitstat(object, "f.df2"))
    } else if (class(object) %in% c("lm", "felm")) { 
        df <- object$df.residual
    } else {
        stop("class(object) should be lm, fixest, or felm.")
    }

    # Keep only variables in both newdata (X) and object (B/sigma)
    X <- X[, intersect(colnames(X), names(B))]
    i <- which(names(B) %in% colnames(X))
    B <- B[i]
    sig <- sig[i, i]
    t_val <- qt((1 - level) / 2 + level, df = df)

    # Compute coefficient, standard errors, and CI
    fit <- data.frame(fit = as.vector(X %*% B))
    fit$se <- apply(X, MARGIN = 1, FUN = get_se, sig = sig)
    fit$lwr <- fit$fit - t_val * fit$se
    fit$upr <- fit$fit + t_val * fit$se

    fit
}

get_se <- function(r, sig) {
    # Compute linear combination, helper function for predict_partial
    # Given numeric vector r (constants) and vcov (sig), compute SE 
    r <- matrix(r, nrow = 1)
    sqrt(r %*% sig %*% t(r))
}

N <- 100
data <- data.frame(x = rnorm(N), z = rnorm(N), group = rep(c("A", "B"), times = N/2))
data$y <- 1 + 3 * data$x - 2 * data$x^2 + rnorm(N, 0, 0.5) + as.numeric(data$group == "A") * 3

newdata <- data.frame(x = seq(-3, 3, 0.1))

# Works, but standard errors are wrong because of degrees of freedom correction
preds <- predict_partial(feols(y ~ x + I(x^2) | group, data = data), newdata)

# Plot
plot_df <- cbind(newdata, preds)
plot(plot_df$x, plot_df$fit, type = "l")
lines(plot_df$x, plot_df$lwr, lty = 2)
lines(plot_df$x, plot_df$upr, lty = 2)

# Also works with lm and felm
preds_lm <- predict_partial(lm(y ~ x + I(x^2), data = data), newdata)
library(lfe)
preds_felm <- predict_partial(felm(y ~ x + I(x^2) | group, data = data), newdata)

# Does not work (yet) if there are other variables in formula.
preds_feols2 <- predict_partial(feols(y ~ x + I(x^2) + factor(group), data = data), newdata)
preds_feols3 <- predict_partial(feols(y ~ x + I(x^2) + z, data = data), newdata)

lrberge commented 3 years ago

Gosh, thanks to your very clear code now I see the light! :-D

Can you imagine I never understood what it really was before your code???? It seems that my mind is so messed up that I can only understand code: I'll suggest that to my wife!

So, thanks again! Actually there were many small things that were obvious to you but not to me. All is seems clear now. Below I tried to clarify with my words, my main point of confusion was due to the baseline that was sometimes implicitly missing.

I hope what I say isn't wrong!

It seems to me that the code you provide refers to the special case with a missing baseline.

Now that it's spelled out, it think it can fit the package very easily. Just adding an extra argument partial to the predict function can very well do the job. I was thinking to add an extra argument baseline: do you think it would be useful?

To answer your questions :

No, there's currently no way to do what you want with model.matrix. There's a non exported fixest_model_matrix though. I think I'll add a partial argument in model.matrix to perform the requested behavior (not sure it should be the default behavior).
That's tricky. In general if the SEs are not clustered f.df2 should work. Otherwise, it should be the weird stat called g. I've just changed the stat g such that it explicitly refers to the DoF in the t-test.

pbaylis commented 3 years ago

Yes, you've got it exactly! Brilliant. I think I don't understand what I'm talking about myself until I put it into code.

Your insight about the baseline is clarifying for me, and I do think allowing for different baselines is helpful (in temperature impact regressions, the comparison is usually to 20 C). As you point out, really what this is doing is predicting the difference in Xb for a range of values for x and some baseline z. It makes me wonder if this couldn't all be done as a single argument namedbaseline. baseline could determine behavior as follows:

Case 1 (default behavior): baseline = NULL. predict proceeds as usual. Case 2: baseline = 0. predict follows the special case you describe above, reporting the hat(x), se(hat(x)), and ci(hat(x)) where x is given by newdata (with x = z assumed for all variables in the RHS but not in newdata) and z = 0 for all variables in newdata. Case 3: baseline is a numeric vector with (named?) nonzero entries. predict reports the hat(x), se(hat(x)), and ci(hat(x)) where the values of x is given by newdata (with x = z assumed for all variables in the model but not in newdata) and z = baseline for all variables in in newdata.

One user-friendly addition to Case 3 would be allowing the baseline to be transformed following the formula. So if one wanted to set the baseline to z = 20 and the model is y = x + x^2, the function would interpret the baseline as z_1 = 20 and z_2 = 400. I don't know how much of a headache that would be, though.

But, the risk of collapsing this to one argument is that some users might not realize that this type of "prediction" doesn't include the variance from the fixed effects, which would be more evident with the two arguments you propose. I defer to what you think is best here, obviously.

lrberge commented 3 years ago

Sorry, I don't have the time to dive into this, but your suggestions are great!

A few thoughts:

having two arguments may be too much, I agree
baseline may not be explicit enough => maybe we could just name it partial instead to make it more explicit?
- partial = TRUE (like baseline = 0, all variables in newdata are used)
- partial = namedNumVec, like Case 3, user-friendly evaluation on. Only variables in names(partial) are used in newdata. I personally think having the user to put names is an OK requirement (may lead to less mistakes), but maybe we could accept non-named when ncol(newdata) == length(partial)?
- partial = charVec: only the variables in newdata that are in partial are used. The baseline is assumed to be 0 for them.
there's also the case where, say, the user wants the prediction for x_1 and gives the baseline for x_1 and x_2, or when the baseline is the mean of the dependent variable, or the value of the dependent variable at means of the variables. => Do you think this case would be used in practice?

pbaylis commented 3 years ago

Not at all! This is super helpful feedback for me to put together a PR at some point. But it's not pressing and you have a lot going on right now, so please don't feel rushed to respond.

For the last case, I'm afraid I don't completely follow. Since we communicate best in code, here's what I think you're describing:

N <- 100
df <- data.frame(
  y = rnorm(N),
  x1 = rnorm(N),
  x2 = rnorm(N)
)

fit <- feols(y ~ x1 + x2, data = df)

newdata <- data.frame(
  x1 = seq(-2, 2, 0.1)
)

predict(fit, newdata = newdata, partial = c(x1 = 1, x2 = 2)) 
predict(fit, newdata = newdata, partial = c(x1 = 1))

I would expect these last two commands to return the same answer, since the user is implying that they want x2 to remain fixed for this prediction and therefore its baseline doesn't matter. But I expect I'm misinterpreting what you're saying.

lrberge commented 3 years ago

The thing I'm referring to looks very niche in retrospect, but I still think it's valid.

Let's take your example (an estimation with two variables x1 and x2). Assume a predict function with the partial argument taking the variable for which to compute the new prediction and a baseline argument setting the baseline. The function calls would be:

p1 = predict(fit, newdata = newdata, partial = "x1", baseline = c(x1 = 1))
p2 = predict(fit, newdata = newdata, partial = "x1", baseline = c(x1 = 1, x2 = 2))

For sure since only x1 varies, the curvature and CI in p1 will be identical to the one in p2. The levels in the y-axis will be different however. I think it can be interesting when we want to see directly the effect of the variation of x1 on the dep var, taking into account the fixed influence of the other variables. It avoids to mentally add the effect of the other variables and gives a proper scale (eg if the effect of x1 ranges in btw -1 and 1, and the dep variable at mean is 1000, we can directly see that x1 has no impact, while without that we could imagine that a [-1; 1] range is big).

Here's an example of the difference:

Agreed it looks niche and may go too far, especially that I've never used that kind of tool so I don't know the practical interest of rescaling. But I think it's worth to investigate all possible usages early before deciding on the arguments.

pbaylis commented 3 years ago

Ohh, yes - that does make sense. I've seen it done before, although I'm actually struggling to find an example right now. I suppose the question is if it's worth the cost of adding another argument and potential confusion regarging what that argument does, when it comes down to applying a shifter to the y variable. My inclination is no, especially since it could require some additional pre-processing to work out what the user wants, but I could be convinced otherwise.

This raises the question (for me) of what people will expect from a "partial" prediction: will they understand that it represents the difference between some set of values and a baseline? I guess the most precise thing to call it is the "prediction relative to a baseline". A "relative prediction"? A "differential prediction"?

lrberge commented 3 years ago

On the extra baseline stuff. If you, an user of such statistics, don't remember the last time you've seen it employed: I suggest just to forget about it!!! :-D

One the name. Hmmm, I agree, partial would be misleading since it is used in other stats with a different meaning. One suggestion: basepred (short for baseline prediction)? In the same line, following your suggestions: relpred, diffpred? I tend to (weakly) prefer relpred. Naming is such a pain! :-)

pbaylis commented 3 years ago

I like relpred too. Sold!

On Mon, Mar 15, 2021 at 7:58 AM Laurent Bergé @.***> wrote:

On the extra baseline stuff. If you, an user of such statistics, don't remember the last time you've seen it employed: I suggest just to forget about it!!! :-D

One the name. Hmmm, I agree, partial would be misleading since it is used in other stats with a different meaning. One suggestion: basepred (short for baseline prediction)? In the same line, following your suggestions: relpred, diffpred? I tend to (weakly) prefer relpred. Naming is such a pain! :-)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lrberge/fixest/issues/22#issuecomment-799487480, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4C6ZDRCPDASR2JFUCEVL3TDYOANANCNFSM4NZZFVUA .

judgelord commented 3 years ago

Thank you for the great package! I am just trying to get a sense of the status of this (really useful) feature.

What I will do is to include se.fit (and also confidence intervals) to predict when models don't contain explicit fixed-effects, and return NA for these measures when FEs are present.

se.fit is not yet in predict.fixest, correct?

To the above reprex:

head(se.fit.fixest(b))
# => 0.04052941 0.03473950 0.03705546 0.03589748 0.04168740 0.04516134

head(predict(b, newdata = base, type = "response", se.fit =TRUE))
# => Warning: In predict.fixest(b, newdata = base, type = "respons...:
# =>  'se.fit' is not a valid argument of function predict.fixest.
# => 5.063856 4.662076 4.822788 4.742432 5.144212 5.385281

As noted, this has major uses in margins (per this issue and broom.

Indeed, the broom docs for using augment with fixest objects lead with a general description of augment highlighting the .se.fit column:

Augment data with information from a(n) fixest object Source: R/fixest-tidiers.R Augment accepts a model object and a dataset and adds information about each observation in the dataset. Most commonly, this includes predicted values in the .fitted column, residuals in the .resid column, and standard errors for the fitted values in a .se.fit column.

However, it seems that because augment.fixest depends on predict.fixest, the se_fit argument is silently ignored.

broom::augment(a,   type.predict = "response", newdata = base, se_fit = TRUE)
# A tibble: 150 x 8
       y    x1    x2    x3 species .fitted .se.fit  .resid
   <dbl> <dbl> <dbl> <dbl> <fct>     <dbl>   <dbl>   <dbl>
 1   5.1   3.5   1.4   0.2 setosa     5.06  0.0624  0.0361

broom::augment(b,   type.predict = "response", newdata = base, se_fit = TRUE)
# A tibble: 150 x 6
       y    x1    x2    x3 species .fitted
   <dbl> <dbl> <dbl> <dbl> <fct>     <dbl>
 1   5.1   3.5   1.4   0.2 setosa   0.0361

Obviously, this is not an issue with fixest, just illustrating a use case!

lrberge commented 3 years ago

@judgelord: In the end I dropped the implementation. But why not doing it indeed! Maybe it'll be there in the next version.

@pbaylis: I was thinking to the implementation again, and maybe it's worth to create a function instead of being a predict functionality. That's because (I think) it can be extended to non linear models, meaning that we can apply a function to the prediction and get the SEs via the delta method. But for these models, things are not separable, as with OLS, so one would need extra arguments like the at_means which makes sense there. Then it starts getting heavy in arguments and it may be worth to separate it from predict to clarify things. Just a thought! :-)

pbaylis commented 3 years ago

Makes sense to me, we can call it relpred(). I worked on this for a little bit over the last couple days. It's getting there, but I've run into some housekeeping problems related to factor variables that I'm not sure how to resolve.

Problem 1: While lm returns an object with object$xlevels that lets us ensure we have the same factor levels for factor variables included on the RHS of the formula, there's no equivalent that I can find from the fixest output. Any ideas @lrberge ?

Problem 2: Related, I haven't figured out how to properly account for the possibility that users might pass factor variables either in newdata/baseline. The basic issue comes down to ensuring that the function can reproduce a design matrix identical to the one estimated in the fitted model. I think this is doable, but there are a lot of little housekeeping steps and edge cases that require some careful thinking and R knowledge. And, to be honest, the current code suits my use cases so I'm not super motivated to resolve them! But I'm sure other users would find this limitation annoying, and it wouldn't really be a proper addition to fixest without this resolved.

At any rate, I wish this were more complete but here's what I have right now, warts and all. Crossing my fingers that someone smarter / more enterprising wants to figure out how to resolve the above.

library(fixest)

relpred <- function(object, newdata, baseline = NULL, level = 0.95) {

  # object: lm or fit_feols fitted model
  # newdata: data.frame (or matrix) of X variables with names that correspond to fitted model (names that don't fit are ignored)
  # baseline: list with same length as the number of columns in newdata (or, eventually, matching names). without this, baseline is assumed to be zero.
  # object <- lm(y ~ x + I(x^2) + group, data = data)
  # object <- feols(y ~ x + I(x^2) | group, data = data)
  # newdata <- newdata; level = 0.95
  # baseline <- NULL
  # baseline <- list(x = 1)
  # END DEBUG

  # TODO: Add support for factor variables in newdata later. Requirements
  # - Must either check or convert variables to match levels in object.
  # - Decide how baseline should be chosen for factor variables if baseline is explicitly supplied. 
  if (any(sapply(newdata, function(x) !is.numeric(x)))) {
    stop("For now, only numeric variables are supported in newdata.")
  }

  if (!is.null(baseline) & ncol(newdata) != length(baseline)) {
    stop("baseline vector length must be equal to number of columns in newdata.")
  }

  # Populate a data.frame with all variables used in object estimation and return a design matrix that matches that object.
  newdata_filled <- fill_missing_vars(object, newdata)

  # If baseline isn't passed, assume zero
  if (is.null(baseline)) {
    baseline_df <- newdata
    baseline_df[] <- 0
  } else {
    baseline_df <- as.data.frame(baseline)
  }

  baseline_filled <- fill_missing_vars(object, baseline_df)

  # Subtract baseline from newdata to get X
  X <- as.matrix(newdata_filled - baseline_filled)

  # Compute coefficients and standard errors ----
  B <- as.numeric(coef(object))

  # Compute degrees of freedom
  if (inherits(object, "fixest")) {
    df <- attributes(vcov(object, attr = T))$G
  } else {
    df <- object$df.residual
  }

  # Drop any variables in X that are not the fitted model object
  # This will handle a variety of issues, including a missing intercept term
  X <- X[, names(coef(object))]

  fit <- data.frame(fit = X %*% B)
  sig <- vcov(object)
  se <- apply(X, MARGIN = 1, FUN = get_se, sig = sig)

  t_val <- qt((1 - level) / 2 + level, df = df)
  fit$lwr <- fit$fit - t_val * se
  fit$upr <- fit$fit + t_val * se

  fit
}

fill_missing_vars <- function(object, X) {
  # Populate a data.frame with all variables used in object estimation and return a design matrix with the same coefficients.
  # Factor variables filled in using the first factor level, numeric filled with zeroes

  orig_vars <- all.vars(formula(object))[-1]

  # Add any factor variables that are in the RHS but not newdata with the baseline value
  # TODO: How to add back a missing factor variable for feols? We can do it with lm using object$xlevels
  # TODO: If the user passes, e.g., factor(group) this will not work. Not sure how to resolve.
  fact_vars <- names(object$xlevels)
  for (f in fact_vars) {
    if (!(f %in% names(X))) {
      X[[f]] <- factor(object$xlevels[[f]][1],
                       levels = object$xlevels[[f]])
    } else { # This is risky, but will ensure levels in object match those in X...
      X[[f]] <- factor(X[[f]],
                       levels = object$xlevels[[f]])
    }
  }

  # If there are any other variables in the formula but not in X, add them with value = 0
  for (v in orig_vars) {
    if (!(v %in% names(X))) {
      X[[v]] <- 0
    }
  }

  # Extract terms object, removing response variable 
  tt <- terms(object)
  Terms <- delete.response(tt)

  as.data.frame(model.matrix(Terms, data = X))
}

get_se <- function(r, sig) {
  # Compute linear combination, helper function for predict_partial
  # Given numeric vector r (the constants) and vcov sig, compute SE 
  r <- matrix(r, nrow = 1)
  sqrt(r %*% sig %*% t(r))
}

N <- 100
data <- data.frame(x = rnorm(N), z = rnorm(N), group = rep(c("A", "B"), times = N/2))
data$y <- 1 + 3 * data$x - 2 * data$x^2 + rnorm(N, 0, 0.5) + as.numeric(data$group == "A") * 3

newdata <- data.frame(x = seq(-2, 2, 0.1))

preds <- relpred(feols(y ~ x + I(x^2) | group, data = data), newdata)

# Plot
plot_df <- cbind(newdata, preds)
plot(plot_df$x, plot_df$fit, type = "l")
lines(plot_df$x, plot_df$lwr, lty = 2)
lines(plot_df$x, plot_df$upr, lty = 2)

# Things that do not work (yet) ----

# Including a factor variable in newdata. 
newdata2 <- newdata
newdata2$group <- sample(data$group, nrow(newdata), replace = T)
preds <- relpred(lm(y ~ x + I(x^2) + group, data = data), newdata2)

# Including a factor variable on the RHS of formula (not in FE) when using feols
preds <- relpred(feols(y ~ x + I(x^2) + group, data = data), newdata)

lrberge commented 3 years ago

Great! Thanks. I'll try to implement it in version 0.9.0.

lrberge commented 3 years ago

Sorry for the delay. I'm releasing v0.10.0 and it's not there. But I've started to sketch the implementation and I promise that it will be in the next version!

vincentarelbundock commented 3 years ago

Hi all,

Sorry to hijack this thread, but I came here looking for standard errors around my predicts, and stumbled upon this discussion of margins. I just want to plug the fact that the marginaleffects package has early support for fixest. Frankly, I haven’t really tested the limits of this support or tried complicated models, but it seems to work in simple cases with both continuous and categorical variables:

library(fixest)
library(marginaleffects)

tmp = mtcars
tmp$gear = as.factor(tmp$gear)
mod <- feols(mpg ~ hp * wt + gear | cyl, data = tmp)

marginaleffects(mod) |> summary()
# Average marginal effects 
#       type  Term  Effect Std. Error z value   Pr(>|z|)    2.5 %   97.5 %
# 1 response gear4  1.1933    1.77506  0.6723  0.5014067 -2.28572  4.67240
# 2 response gear5  2.2783    1.71694  1.3269  0.1845297 -1.08687  5.64340
# 3 response    hp -0.0391    0.01476 -2.6497  0.0080567 -0.06803 -0.01018
# 4 response    wt -3.3501    0.35098 -9.5451 < 2.22e-16 -4.03800 -2.66220
# 
# Model type:  fixest 
# Prediction type:  response

Let me know if you try it and run into issues.

lrberge commented 2 years ago

Hi Vincent! You ain't hijacking anything since this seems to be the most appropriate thread to introduce the marginaleffects package!

Your package is in fact a feature that has been missing from R for too many years now (eg I used to code it up when I needed fast partial effects!), it will be useful for so many, I've no doubt it's gonna be an instant hit! Well done!

That said, I'm not sure it fits exactly @pbaylis's request, since it focuses only on the variation of a specific variable, but I may be wrong. Great job anyway!

vincentarelbundock commented 2 years ago

Cool cool.

That said, I'm not sure it fits exactly @pbaylis's request, since it focuses only on the variation of a specific variable

Ooops, you're right, I misread. My package cannot compute Var(Xb).

judgelord commented 2 years ago

maginaleffects looks really useful, but for anyone that found this thread because they are trying to get errors/CI for fitted values from a fixest object, I just want to clarify the difference between what predict(), augment(), and marginaleffects() return when each call predict.fixest. For the example above b = feols(y ~ x1 + species, data = base):

predict() gives a helpful error:

b %>% predict(newdata = base, type = "response", se.fit =T)
Error in predict.fixest(., newdata = base, type = "response", se.fit = T) : 
  The standard-errors (SEs) of the prediction cannot be computed in the presence of fixed-effects. To obtain the SEs, you would need to include the FEs as standard factors in the model.

augment() fails silently:

> broom::augment(b,   type.predict = "response", newdata = base, se_fit = TRUE) 
# A tibble: 150 × 6
       y    x1    x2    x3 species .fitted
   <dbl> <dbl> <dbl> <dbl> <fct>     <dbl>
 1   5.1   3.5   1.4   0.2 setosa     5.06
 2   4.9   3     1.4   0.2 setosa     4.66
 3   4.7   3.2   1.3   0.2 setosa     4.82
 4   4.6   3.1   1.5   0.2 setosa     4.74
 5   5     3.6   1.4   0.2 setosa     5.14
 6   5.4   3.9   1.7   0.4 setosa     5.39
 7   4.6   3.4   1.4   0.3 setosa     4.98
 8   5     3.4   1.5   0.2 setosa     4.98
 9   4.4   2.9   1.4   0.2 setosa     4.58
10   4.9   3.1   1.5   0.1 setosa     4.74
# … with 140 more rows

marginaleffects() has a different aim, but does produce an error when asked to passse.fit = T into predict.fixest()

> marginaleffects::marginaleffects(b, newdata = base, se.fit = F) %>% tibble() 
# A tibble: 150 × 10
   rowid type     term   dydx std.error     y    x1    x2    x3 species
   <int> <chr>    <chr> <dbl>     <dbl> <dbl> <dbl> <dbl> <dbl> <fct>  
 1     1 response x1    0.804    0.0714   5.1   3.5   1.4   0.2 setosa 
 2     2 response x1    0.804    0.0714   4.9   3     1.4   0.2 setosa 
 3     3 response x1    0.804    0.0714   4.7   3.2   1.3   0.2 setosa 
 4     4 response x1    0.804    0.0714   4.6   3.1   1.5   0.2 setosa 
 5     5 response x1    0.804    0.0714   5     3.6   1.4   0.2 setosa 
 6     6 response x1    0.804    0.0714   5.4   3.9   1.7   0.4 setosa 
 7     7 response x1    0.804    0.0714   4.6   3.4   1.4   0.3 setosa 
 8     8 response x1    0.804    0.0714   5     3.4   1.5   0.2 setosa 
 9     9 response x1    0.804    0.0714   4.4   2.9   1.4   0.2 setosa 
10    10 response x1    0.804    0.0714   4.9   3.1   1.5   0.1 setosa 
# … with 140 more rows

> marginaleffects::marginaleffects(b, newdata = base, se.fit = T) %>% tibble() 
Error in get_predict.fixest(model = model, newdata = newdata_tmp, type = type,  : 
  Unable to extract predictions from a model of type `fixest`. Please report this problem, along with replicable code, on the `marginaleffects` issue tracker: https://github.com/vincentarelbundock/marginaleffects/issues

Also, if there are solutions forthcoming, I would love know. (The broom docs still incorrectly imply that augment will return .se.fit. ) Thanks for all your work and guidance on this!

vincentarelbundock commented 2 years ago

@judgelord if you install the development versions of insight and marginaleffects you should be able to obtain standard errors for your predictions with the marginaleffects::predictions function:

remotes::install_github("easystats/insight")
remotes::install_github("vincentarelbundock/marginaleffects")

Restart the R session entirely to make sure the new packages are loaded. Then:

library(fixest)
library(marginaleffects)

tmp = mtcars
tmp$gear = as.factor(tmp$gear)
mod <- feols(mpg ~ hp * wt + gear | cyl, data = tmp)

predictions(mod) |> head()
#>   rowid     type predicted std.error  mpg  hp    wt gear cyl
#> 1     1 response  22.59828 0.4862169 21.0 110 2.620    4   6
#> 2     2 response  21.51867 0.4541136 21.0 110 2.875    4   6
#> 3     3 response  25.73622 0.5860365 22.8  93 2.320    4   4
#> 4     4 response  18.88585 1.3735593 21.4 110 3.215    3   6
#> 5     5 response  16.94477 2.1462778 18.7 175 3.440    3   8
#> 6     6 response  18.01485 1.2923217 18.1 105 3.460    3   6

judgelord commented 2 years ago

^this works for most of my fixest models. Thank you, @vincentarelbundock!

I know here is not the place for this, but perhaps you can point me to the right issue or docs for more on this warning, where predictions failed to compute se for predictions at some levels: In sqrt(colSums(t(J %*% vcov) * t(J))) : NaNs produced

vincentarelbundock commented 2 years ago

I know here is not the place for this, but perhaps you can point me to the right issue or docs for more on this warning, where predictions failed to compute se for predictions at some levels: In sqrt(colSums(t(J %*% vcov) * t(J))) : NaNs produced

I might be able to help if you post a minimal replicable example to the marginaleffects repo on github