jeffwong / imputation

R package for data imputation. Fills missing values in a numeric matrix
38 stars 24 forks source link

kNN should use weighted mean, not simple mean #10

Closed jeffwong closed 11 years ago

jeffwong commented 11 years ago

weights should be based on a measure of similarity. This is a bit difficult since kNN looks at the euclidean distance and picks neighbors who have the smallest distance. Normally, similarity is defined as 1 - dissimilarity, if dissimilarity is on a 0-1 scale. The euclidean distance doesn't normalize to this scale, so we might try distances / max(distances).