Open ajulian3 opened 9 years ago
One of the pitfalls of k-nearest model algorithm is that its performance falters with a large dataset and when the number of dimensions is large. When too many features are selected for, and number of dimensions are high, the "curse of dimensionality" can occur, where the number of data points you need to sample grows exponentially as you add more dimensions. Therefore, if one were to use this algorithm for neural networks, he or she would have to know what features they are interested in. Without feature reduction, the k-nearest would perform poorly in a neural model consisting of many dimensions.
For many biological problems, the issue is high dimensionality, small training set. For small training sets, kNN classifiers tend to overfit, whereas high bias/low variance classifiers such as Naive Bayes do not.
Would it be effective to use a K-Nearest-Neighbor algorithm in analysis of neural network data. Specifically, if you assign the spatial information to a vector would this be more effective than using k-means classification. What are the pitfalls of this algorithm in a biological environment?