Closed tdeenes closed 2 years ago
Hi @tdeenes , Many thanks for your interest and providing the example.
The main reason you get weird results is because the call deepregression
and safareg
only initializes the model. You still need to run fit
. For example, after you initialized the m_sr
model, do the following:
m_sr %>% fit(epochs = 3000L,
validation_split = 0.1,
early_stopping = TRUE)
Metrics:::auc(
dat$large,
predict(m_dr)
) # gives 0.9079925
table(
dat$large,
predict(m_sr) > 0.5
) # yields
# FALSE TRUE
# FALSE 171 30
# TRUE 15 216
This is still not as good as the GAM from mgcv
and others. The reason is that mgcv
and other mixed model packages automatically estimate the random effect variances, whereas for our models, this is something like a tuning parameter (controlled via df
). Depending on the batch_size
, validation_split
and early_stopping
the neural network training might further induce a regularization (which is good for prediction on new data sets, but shrinks coefficients and fitted values) -- you might therefore get worse accuracy when thresholding with 0.5 (while the AUC can be quite good) as the predicted probabilities are not well calibrated.
OMG, I should have read the tutorial a bit more carefully...
Nevertheless, I think the predict
method should fail on a model which has not been fitted yet, shouldn't it? Probably with a gentle reminder that fit()
shall be called first to get meaningful results.
Hmm.. I agree that this not really in line with other statistical models. So a message could maybe be appropriate, yes, but not an error I think.
The reason we did it like this is because this is how neural networks are implemented (see, e.g., the keras
library). And since the deepregression
model is also a keras
neural network (where you can, e.g., load existing weights into the model without ever running fit
, or, you call fit
several times with different settings), it follows the same principles.
We are, however, planning to have an add-on package in the near future that implements different optimization strategies, in particular those used for GAMs. In this case, having a wrapper function that defines the additive model and also directly runs the training might in fact be a meaningful default :)
Thank you for the explanation, this makes sense. But note that I was talking about the predict()
method and not on whether auto-fit is necessary or not. What meaningful results can a predict()
call return on a model which has not been fitted yet and no weights have been loaded into it (so it is in a pure "created" state)?
Calling deepregression
not only defines the model specifications but also initializes the neural network that represents the model, i.e., samples initial weights for all neurons/layers. So in keras
terms this means that we have already called compile
when running the deepregression(...)
command. And then, as in keras
, we can call predict
on this (already compiled) model. This is, e.g., helpful if you want to use a randomly initialized neural network as a data generating function or in case you pass warmstart weights to the network.
Ah, I get it now. Thank you again!
Probably I do something wrong, but it seems I can not mimic a mixed effect analysis with deepregression (or safareg). Can you please point me to the right direction?