hongooi73 / glmnetUtils

Utilities for glmnet
65 stars 18 forks source link

How to use sum contrasts for predictions with glmnetUtils #34

Open FelixNoessler opened 10 months ago

FelixNoessler commented 10 months ago

Hello,

thanks for creating the R package!

I have some problems with fitting an model with sum contrasts. The coefficients seem fine (similar to a non-regularized model), but the predictions are far off when I fit a model with sum contrasts. Here is a reproducible example (sorry, it is not super short):

library(dplyr)
library(ggplot2)
library(tidyr)
library(glmnetUtils)

df <- palmerpenguins::penguins

df <- df %>%
  select(species, body_mass_g) %>%
  drop_na()

sum_contr_species <- contr.sum(3)
colnames(sum_contr_species) <- levels(df$species)[1:2] 

treatment_contr_species <- contr.treatment(3)
colnames(treatment_contr_species) <- levels(df$species)[2:3] 

## fit linear models for comparison with sum and treatment contrasts
lm_sum_fit <- lm(body_mass_g ~ species, data = df,
             contrasts = list(species = sum_contr_species))
lm_sum_fit

lm_treatment_fit <- lm(body_mass_g ~ species, data = df,
                       contrasts = list(species = treatment_contr_species))
lm_treatment_fit

## fit elastic net models: default, treatment, and sum contrasts
alpha_vals <- c(0, 0.1, 0.5, 0.7, 0.8, 0.9, 0.95, 0.99, 1)
elnet_default <- cva.glmnet(
  body_mass_g ~ species, 
  alpha = alpha_vals,
  data = df, 
  nfolds = 15,
  use.model.frame = F)

selected_alpha <- 0.9
coef(elnet_default, s = "lambda.1se", alpha = selected_alpha)

elnet_treatment <- cva.glmnet(
  body_mass_g ~ species, 
  alpha = alpha_vals,
  data = df, 
  nfolds = 15,
  use.model.frame = T)
coef(elnet_treatment, s = "lambda.1se", alpha = selected_alpha)

contrasts(df$species) <- sum_contr_species 
elnet_sum <- cva.glmnet(
  body_mass_g ~ species, 
  alpha = alpha_vals,
  data = df, 
  nfolds = 15,
  use.model.frame = T)
coef(elnet_sum, s = "lambda.1se", alpha = selected_alpha)

## make predictions for all models
df_pred <- tibble(species = unique(df$species)) %>%
  mutate(
    lin_sum = predict(lm_sum_fit, newdata = .),
    lin_treatment = predict(lm_treatment_fit, newdata = .),
    elnet_default = predict(elnet_default, newdata = ., 
                            s = "lambda.1se", alpha = selected_alpha)[,1],
    elnet_sum = predict(elnet_sum, newdata = ., 
                        s = "lambda.1se", alpha = selected_alpha)[,1],
    elnet_treatment = predict(elnet_treatment, newdata = ., 
                      s = "lambda.1se", alpha = selected_alpha)[,1],
  )

ggplot() +
  geom_point(aes(species, value, color = name),
             data = pivot_longer(df_pred, -species),
             size = 5) +
  geom_dotplot(aes(species, body_mass_g), data = df,
               binaxis = "y", dotsize = 0.5, 
               stackdir = "center", alpha = 0.7,
               fill = NaN)

plot