I have some problems with fitting an model with sum contrasts. The coefficients seem fine (similar to a non-regularized model), but the predictions are far off when I fit a model with sum contrasts. Here is a reproducible example (sorry, it is not super short):
library(dplyr)
library(ggplot2)
library(tidyr)
library(glmnetUtils)
df <- palmerpenguins::penguins
df <- df %>%
select(species, body_mass_g) %>%
drop_na()
sum_contr_species <- contr.sum(3)
colnames(sum_contr_species) <- levels(df$species)[1:2]
treatment_contr_species <- contr.treatment(3)
colnames(treatment_contr_species) <- levels(df$species)[2:3]
## fit linear models for comparison with sum and treatment contrasts
lm_sum_fit <- lm(body_mass_g ~ species, data = df,
contrasts = list(species = sum_contr_species))
lm_sum_fit
lm_treatment_fit <- lm(body_mass_g ~ species, data = df,
contrasts = list(species = treatment_contr_species))
lm_treatment_fit
## fit elastic net models: default, treatment, and sum contrasts
alpha_vals <- c(0, 0.1, 0.5, 0.7, 0.8, 0.9, 0.95, 0.99, 1)
elnet_default <- cva.glmnet(
body_mass_g ~ species,
alpha = alpha_vals,
data = df,
nfolds = 15,
use.model.frame = F)
selected_alpha <- 0.9
coef(elnet_default, s = "lambda.1se", alpha = selected_alpha)
elnet_treatment <- cva.glmnet(
body_mass_g ~ species,
alpha = alpha_vals,
data = df,
nfolds = 15,
use.model.frame = T)
coef(elnet_treatment, s = "lambda.1se", alpha = selected_alpha)
contrasts(df$species) <- sum_contr_species
elnet_sum <- cva.glmnet(
body_mass_g ~ species,
alpha = alpha_vals,
data = df,
nfolds = 15,
use.model.frame = T)
coef(elnet_sum, s = "lambda.1se", alpha = selected_alpha)
## make predictions for all models
df_pred <- tibble(species = unique(df$species)) %>%
mutate(
lin_sum = predict(lm_sum_fit, newdata = .),
lin_treatment = predict(lm_treatment_fit, newdata = .),
elnet_default = predict(elnet_default, newdata = .,
s = "lambda.1se", alpha = selected_alpha)[,1],
elnet_sum = predict(elnet_sum, newdata = .,
s = "lambda.1se", alpha = selected_alpha)[,1],
elnet_treatment = predict(elnet_treatment, newdata = .,
s = "lambda.1se", alpha = selected_alpha)[,1],
)
ggplot() +
geom_point(aes(species, value, color = name),
data = pivot_longer(df_pred, -species),
size = 5) +
geom_dotplot(aes(species, body_mass_g), data = df,
binaxis = "y", dotsize = 0.5,
stackdir = "center", alpha = 0.7,
fill = NaN)
Hello,
thanks for creating the R package!
I have some problems with fitting an model with sum contrasts. The coefficients seem fine (similar to a non-regularized model), but the predictions are far off when I fit a model with sum contrasts. Here is a reproducible example (sorry, it is not super short):