Data column named "group" causes duplicate column issue for `predictions(..., by = ...)`

vincentarelbundock / pymarginaleffects

GNU General Public License v3.0

49 stars 9 forks source link

Hi @vincentarelbundock

I thought I'd flag this one with you as well: predictions() with the by argument fails when the data has a column named "group".

Reprex:

import pandas as pd
import statsmodels.formula.api as smf
from marginaleffects import predictions

diamonds = pd.read_csv("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/ggplot2/diamonds.csv")

model = smf.ols("price ~ cut", data = diamonds).fit()

# Works
predictions(model, newdata=diamonds, by ="cut")

# Create column named group
diamonds["group"] = diamonds["color"]

# Fails
predictions(model, newdata=diamonds, by ="cut")

DuplicateError: column with name 'group' has more than one occurrences

vincentarelbundock / pymarginaleffects

Data column named "group" causes duplicate column issue for `predictions(..., by = ...)` #90