There are several issues associated with equations (8.48) and (8.49) and the surrounding text.
It would be great if that material were to be slightly re-written for clarity.
Issue #1: equation (8.48) is not an equation. Should it be? It looks like equations (8.48) and (8.49) are estimates of the "evidence" of the data, given a model.
If not, what is the AIC given in (8.48) trying to estimate?
If I've gotten hold of the wrong end of the stick here, it underscores the point that a clearer explanation of this material would be helpful...
Issue #2: how these criteria arise is unclear from the text. It would be great to provide a brief discussion motivating (8.48) and (8.49).
Issue #3: the AIC and BIC seem to be semi-empirical rules of thumb, to be compared across different models to help choose the best model in a principled way. Since this book is about Machine Learning as well as Mathematics, it would be appropriate to give a brief discussion (a paragraph or two) about practical and appropriate application of AIC and BIC in Machine Learning, and caveats about their limitations.
Thanks for the suggestion. Agree that this could need more explanation. I'll leave this as a feature request for the time when we can more than cosmetic changes to the book.
There are several issues associated with equations (8.48) and (8.49) and the surrounding text.
It would be great if that material were to be slightly re-written for clarity.
Issue #1
: equation (8.48) is not an equation. Should it be? It looks like equations (8.48) and (8.49) are estimates of the "evidence" of the data, given a model.So should (8.48) be:
log(evidence) = log p(x) ~ log p(x|theta) - M = AIC
?If not, what is the AIC given in (8.48) trying to estimate?
If I've gotten hold of the wrong end of the stick here, it underscores the point that a clearer explanation of this material would be helpful...
Issue #2
: how these criteria arise is unclear from the text. It would be great to provide a brief discussion motivating (8.48) and (8.49).Issue #3
: theAIC
andBIC
seem to be semi-empirical rules of thumb, to be compared across different models to help choose the best model in a principled way. Since this book is aboutMachine Learning
as well asMathematics
, it would be appropriate to give a brief discussion (a paragraph or two) about practical and appropriate application ofAIC
andBIC
in Machine Learning, and caveats about their limitations.