MACHINE LEARNING BASICS - Deep Learning Book

The no free lunch theorem for machine learning (Wolpert, 1996) states that, averaged over all possible data-generating distributions, every classiﬁcation algorithm has the same error rate when classifying previously unobserved points. In other words, in some sense, no machine learning algorithm is universally any better than any other.

This means that the goal of machine learning research is not to seek a universal learning algorithm or the absolute best learning algorithm. Instead, our goal is to understand what kinds of distributions are relevant to the “real world” that an AI agent experiences, and what kinds of machine learning algorithms perform well on data drawn from the kinds of data-generating distributions we care about.

One common reason for desiring a point estimate is that most operations involving the Bayesian posterior for most interesting models are intractable, and a point estimate oﬀers a tractable approximation. Rather than simply returning to the maximum likelihood estimate, we can still gain some of the beneﬁt of the Bayesian approach by allowing the prior to inﬂuence the choice of the point estimate. One rational way to do this is to choose the maximum a posteriori(MAP) point estimate.

nishnik / Paper-Leaf