vincentarelbundock / pymarginaleffects

GNU General Public License v3.0
49 stars 9 forks source link

Potential bug with `slopes` and variable scaling #113

Open alexjonesphd opened 4 weeks ago

alexjonesphd commented 4 weeks ago

Hi all,

First, a huge thanks for this package. Its great and makes a massive difference to the Python ecosystem for doing statistics - thank you!

I am wondering if there is a bug with slopes and how it operates with formula-based transforms. I seem to get different slope values when a predictor is scaled compared to when it isn't. An example is below:

import pandas as pd
import statsmodels.formula.api as smf
from marginaleffects import *

# Read in affairs data
affairs = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/AER/Affairs.csv')

# Fit a model with and without scaling
scale_n = smf.ols('rating ~ gender * yearsmarried', data=affairs).fit()
scale_y = smf.ols('rating ~ gender * scale(yearsmarried)', data=affairs).fit()

# Compute conditional slopes for yearsmarried, for each level of gender
scale_n_slope = slopes(scale_n, variables='yearsmarried', by='gender')
scale_y_slope = slopes(scale_y, variables='yearsmarried', by='gender')

In the former case, I obtain an estimate of -0.06 for females, and -0.03 for males, but in the latter, I get -0.0002 for females and -0.0001 for males. The comparative R code yields identical estimates for either model, and using bambi's interpret module on a Bayesian version of the model also gives identical results. I am not well-versed in simple slopes analysis (mainly because its not widely available in Python until now!) but I am not sure if this is correct, but I might be missing something fundamental.

Any advice or help is appreciated and thanks again!

vincentarelbundock commented 4 weeks ago

Thanks a lot for the report. I really appreciate that you took the time to craft a nice reproducible example.

This could very well be a bug. If so, I'm very interested in fixing it.

Unfortunately, life is crazy right now with conferences and the coming semester. I can't promise a super quick resolution, but I'll take a look as soon as possible and ping you when I know what the issue is.

alexjonesphd commented 4 weeks ago

No problem - I am actually working on a course for the upcoming semester myself, and this package plays a huge role in it, which is how I came across this. For now I will subtly go against my own advice of standardising variables in regressions and hope no one notices! All the best with conference season and upcoming semester too.