Open akim1 opened 9 years ago
I would think not necessarily. What about the case where you have limited data? Lets say you threw a ball in the air and could only calculate the upward trajectory of the ball for only a few seconds and did not get enough data to retrieve a full, nice parabola. You could potentially end up with a graph that could be either modeled as a linear or quadratic equation,especially if you consider noise and variations in measurements. Although we know (theoretically) that the underlying process is parabolic motion, the data may give us a situation where the parabolic model doesn't necessarily give both the lowest variance and bias.
I think the idea of the bias-variance tradeoff is that you can't change one without changing the other, but not necessarily to equal degrees. The goal is the minimize one of them (whichever is worse for your situation) without increasing the other too much - I think that's what "optimal" means in this case. Unless you have infinite data or you just "know" the answer already, no model that comes from some finite amount of data will be perfect.
The bias-variance tradeoff is Heisenberg's Uncertainty Principle for Statistics.
The loss function is a random variable, so any particular model fit could be giving you the "accurate" fit of the model, given some extremely small probability. But the statistics don't care about the underlying mechanism of the system that produced the result. It just treats it as distribution functions. The model then never accurately fits the real process unless it does so by some very small chance or the system was generated under all the assumptions of the model, which almost never happens.
Doesn't the fact that bias-variance tradeoff exist imply that the model isn't accurately describing the underlying process?
For instance, for a set of data that describes the trajectory of a ball, if we were to fit the data points with a quadratic function, then we would be minimizing both the bias and the variance. But if were were to use a Gaussian to describe the same data, we would run into the problem with bias-variance tradeoff and would need an arbitrarily large number of Gaussians to get a small variance.