Closed aorazalin closed 2 years ago
Good question! I think you can find the answer in Chapter 3.5 in our reference book (An Introduction to Statistical Learning). The K observations that are nearest to a given test observation x0 may be very far away from x0 in p-dimensional space when p is large, leading to a very poor prediction of f(x0) and hence a poor KNN fit. Typically, your sense of distance can vary greatly in high and low-dimensional spaces. In a high-dimensional space, some "similar" data points may be far away from each other.
Best, Zhiwei
A more intuitive explanation is that when applying KNNs, whether beneficial or not, each dimension plays the same role in computing distances (e.g., Euclidean distance) because they have the same weights. But for linear regression, in each dimension, the value of the coefficient depends on the correlation of the predictor and the response variable.
Best, Zhiwei
Thank you!
Slide 53 of LinearModel.pdf says that as p (# of params) increases, linear regression's MSE increases which is obvious (more noise). But I don't get the following: