Weird results with s(..., bs = "re")

tdeenes commented 2 years ago

Probably I do something wrong, but it seems I can not mimic a mixed effect analysis with deepregression (or safareg). Can you please point me to the right direction?

## Example analysis: two crossed random effects

## Prepare data
data(Penicillin, package = "lme4")
n_rows <- nrow(Penicillin)
avg <- mean(Penicillin$diameter)
set.seed(42)
dat <- transform(
  rbind(Penicillin, Penicillin, Penicillin),
  large = diameter + rnorm(n_rows * 3L) > avg
)
str(dat)

## First try with lme4
library(lme4)
m_lme4 <- glmer(
  large ~ 1 + (1|plate) + (1|sample) + (1|plate:sample),
  data = dat,
  family = "binomial"
)
## so far so good...
table(
  observed = dat$large,
  predicted = predict(m_lme4, type = "response") > 0.5
)

## Now with mgcv::gam
library(mgcv)
m_gam <- gam(
  large ~ 1 +
    s(plate, bs = "re") +
    s(sample, bs = "re") +
    s(plate, sample, bs = "re"),
  data = dat,
  family = "binomial"
)
## nice, similar to lme4...
table(
  observed = dat$large,
  predicted = predict(m_gam, type = "response") > 0.5
)

## Experiment with deepregression
library(deepregression)
m_dr <- deepregression(
  y = dat$large,
  data = dat,
  list_of_formulas = list(
    location = ~ 1 +
      s(plate, bs = "re") +
      s(sample, bs = "re", df = 3) +
      s(plate, sample, bs = "re")
  ),
  family = "bernoulli"
)
## the results are very weird
summary(predict(m_dr))

## Experiment with safareg
library(safareg)
m_sr <- deepregression(
  y = dat$large,
  data = dat,
  list_of_formulas = list(
    location = ~ 1 + main(plate) + main(sample) + ia(plate, sample)
  ),
  additional_processors = list(
    main = fac_processor, ia = interaction_processor
  ),
  family = "bernoulli"
)
## same weird results
summary(predict(m_sr))

davidruegamer commented 2 years ago

Hi @tdeenes , Many thanks for your interest and providing the example.

The main reason you get weird results is because the call deepregression and safareg only initializes the model. You still need to run fit. For example, after you initialized the m_sr model, do the following:

m_sr %>% fit(epochs = 3000L, 
             validation_split = 0.1,
             early_stopping = TRUE)

Metrics:::auc(
  dat$large,
  predict(m_dr)
) # gives 0.9079925 

table(
  dat$large,
  predict(m_sr) > 0.5
) # yields
#        FALSE TRUE
#  FALSE   171   30
#  TRUE     15  216

This is still not as good as the GAM from mgcv and others. The reason is that mgcv and other mixed model packages automatically estimate the random effect variances, whereas for our models, this is something like a tuning parameter (controlled via df). Depending on the batch_size, validation_split and early_stopping the neural network training might further induce a regularization (which is good for prediction on new data sets, but shrinks coefficients and fitted values) -- you might therefore get worse accuracy when thresholding with 0.5 (while the AUC can be quite good) as the predicted probabilities are not well calibrated.

tdeenes commented 2 years ago

OMG, I should have read the tutorial a bit more carefully...

Nevertheless, I think the predict method should fail on a model which has not been fitted yet, shouldn't it? Probably with a gentle reminder that fit() shall be called first to get meaningful results.

davidruegamer commented 2 years ago

Hmm.. I agree that this not really in line with other statistical models. So a message could maybe be appropriate, yes, but not an error I think. The reason we did it like this is because this is how neural networks are implemented (see, e.g., the keras library). And since the deepregression model is also a keras neural network (where you can, e.g., load existing weights into the model without ever running fit, or, you call fit several times with different settings), it follows the same principles.

We are, however, planning to have an add-on package in the near future that implements different optimization strategies, in particular those used for GAMs. In this case, having a wrapper function that defines the additive model and also directly runs the training might in fact be a meaningful default :)

tdeenes commented 2 years ago

Thank you for the explanation, this makes sense. But note that I was talking about the predict() method and not on whether auto-fit is necessary or not. What meaningful results can a predict() call return on a model which has not been fitted yet and no weights have been loaded into it (so it is in a pure "created" state)?

davidruegamer commented 2 years ago

Calling deepregression not only defines the model specifications but also initializes the neural network that represents the model, i.e., samples initial weights for all neurons/layers. So in keras terms this means that we have already called compile when running the deepregression(...) command. And then, as in keras, we can call predict on this (already compiled) model. This is, e.g., helpful if you want to use a randomly initialized neural network as a data generating function or in case you pass warmstart weights to the network.

tdeenes commented 2 years ago

Ah, I get it now. Thank you again!

neural-structured-additive-learning / deepregression

Weird results with s(..., bs = "re") #13