Closed jscamac closed 8 years ago
I've had a chat with Lachlan. And he agrees that AUC is a good and common choice for this type of problem.
Decided to use logloss as it avoids simulating over thresholds. Log Loss quantifies the accuracy of a classifier by penalising false classifications. Minimising the Log Loss is basically equivalent to maximising the accuracy of the classifier. More details here: http://www.r-bloggers.com/making-sense-of-logarithmic-loss/
Upon further checking the maths. The log loss is simply the average negative log likelihood of a bernoulli model. As such, we are essentially reporting two measures of fit that are pretty much doing the same thing.
Conclusion is to just monitor logloss. Also I don't see any point on monitoring the likelihood of the fitted data as the LL of the holdout data should be a better predictor of model performance. As such i will also remove the calculation of LL for fitted data.
I think a good measure of predictive fit would be AUC. It is commonly used in SDM literature for presence/absence data.
Basically it will give an indication of how well our model correctly predicts higher probabilities to individuals that were observed to have died. I think this is a better measure then some pseudo R square.
However, it will require that we either monitor the mortality probability within stan or simulate it outside stan. If this is only done on the heldout data we should be able to implement it within stan without running into issues raised in #78.
This combined with log likelihood ratio testing should be all the is needed to examine various model performance.