mml-book / mml-book.github.io

Companion webpage to the book "Mathematics For Machine Learning"
13.21k stars 2.43k forks source link

Clearer explanation of Akaike and Bayes Information Criteria #492

Open jcatanza opened 4 years ago

jcatanza commented 4 years ago

There are several issues associated with equations (8.48) and (8.49) and the surrounding text.

It would be great if that material were to be slightly re-written for clarity.

Issue #1: equation (8.48) is not an equation. Should it be? It looks like equations (8.48) and (8.49) are estimates of the "evidence" of the data, given a model.

So should (8.48) be:

log(evidence) = log p(x) ~ log p(x|theta) - M = AIC ?

If not, what is the AIC given in (8.48) trying to estimate?

If I've gotten hold of the wrong end of the stick here, it underscores the point that a clearer explanation of this material would be helpful...

Issue #2: how these criteria arise is unclear from the text. It would be great to provide a brief discussion motivating (8.48) and (8.49).

Issue #3: the AIC and BIC seem to be semi-empirical rules of thumb, to be compared across different models to help choose the best model in a principled way. Since this book is about Machine Learning as well as Mathematics, it would be appropriate to give a brief discussion (a paragraph or two) about practical and appropriate application of AIC and BIC in Machine Learning, and caveats about their limitations.

mpd37 commented 4 years ago

Thanks for the suggestion. Agree that this could need more explanation. I'll leave this as a feature request for the time when we can more than cosmetic changes to the book.