vincentarelbundock / pymarginaleffects

GNU General Public License v3.0
47 stars 8 forks source link

Extracting vcov and jacobian from predictions #80

Closed RoelVerbelen closed 5 months ago

RoelVerbelen commented 5 months ago

Thanks for all the amazing work you've been doing with the marginaleffects package and the more recent conversion to python!

I'm interested in converting an application of the marginaleffects package from R to python and am running into a hurdle. The functionality is not (yet) one-to-one I believe and I am not sure there's currently any way to achieve what I'm doing in R in python.

I'm hoping I can ask you for some guidance. Here's a reproducible example in R of what I'm after:

model <- lm(mpg ~ hp * wt * am, data = mtcars)

# Select some specific values of choice
variables <- list(
  hp = c(90, 110),
  am = c(0, 1)
)

# Specific data subset of choice
newdata <- mtcars[1:10, ]

avg_preds <- marginaleffects::avg_predictions(model, 
                                              newdata = newdata,
                                              variables = variables, 
                                              type = "response")

# Extract vcov from the predictions (R version >= 0.11.1.9005)
vcov.predictions <- function(x) {
  attr(x, "jacobian") %*% attr(x, "vcov") %*% t(attr(x, "jacobian"))
}

avg_preds_vcov <- stats::vcov(avg_preds)

Here's the same model in python:

import pandas as pd
import statsmodels.formula.api as smf
from marginaleffects import *

mtcars = pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/datasets/mtcars.csv")

model = smf.ols("mpg ~ hp * wt * am", data = mtcars).fit()

Using the python version of avg_predictions, however, I wasn't able to achieve the same flexibility in terms of setting the specific values variables at which to get the predictions and the new data newdata to average over. Also, I don't see how I can extract the vcov and jacobian.

Is this achievable? Many thanks.

vincentarelbundock commented 5 months ago

Hi @RoelVerbelen, thanks for checking out the package!

I agree that this is a useful feature and would be happy to add it. Unfortunately, my "real" day job is crazy these days so I don't have much time, but if you can manage a pull request, I'd be happy to merge. This would involve two main changes:

  1. Changing the MarginalEffectsDataFrame class to accept a "jacobian" argument which assigns to self: https://github.com/vincentarelbundock/pymarginaleffects/blob/main/marginaleffects/classes.py#L5
  2. Make the changes in each function predictions, slopes, comparisons to pass the jacobian to the class builder. For example here: https://github.com/vincentarelbundock/pymarginaleffects/blob/main/marginaleffects/predictions.py#L191
RoelVerbelen commented 5 months ago

Hi @vincentarelbundock, thanks a lot for offering guidance on how to implement that!

I gave the PR a go, see #81. With that additional jacobian argument I was able to set up the code I need, along the lines:

import pandas as pd
import statsmodels.formula.api as smf
from marginaleffects import predictions
from marginaleffects.sanitize_model import sanitize_model

mtcars = pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/datasets/mtcars.csv")

model = smf.ols("mpg ~ hp * wt * am", data = mtcars).fit()

partial_data = predictions(model, by = ['hp', 'am'])

# Calculate vcov of the predictions
jacobian = partial_data.jacobian
model_sanitized = sanitize_model(model)
vcov = model_sanitized.get_vcov()
pd_vcov = jacobian @ vcov @ jacobian.T

This reconciled with what I did previously in R.

Thanks again for all your work on marginaleffects (both in R and python)!

vincentarelbundock commented 5 months ago

Awesome! thanks!