jlmelville / rnndescent

R package implementing the Nearest Neighbor Descent method for approximate nearest neighbors
https://jlmelville.github.io/rnndescent/
GNU General Public License v3.0
10 stars 2 forks source link

Make Hamming distance more generic #4

Closed vspinu closed 2 years ago

vspinu commented 3 years ago

From what I understand hamming distance is currently bitwise, aka considers all non-zero elements as equal. Would it be possible to add a more generic version (maybe call it overlap) that would count all matching elements in the two vectors? This is a more standard definition of the hamming distance.

> (mat <- matrix(c(0, 2, 3, 4, 0, 4, 0, 5), nrow = 4))
     [,1] [,2]
[1,]    0    0
[2,]    2    4
[3,]    3    0
[4,]    4    5
> brute_force_knn(mat, metric = "hamming", k = 2)
$idx
     [,1] [,2]
[1,]    1    3
[2,]    4    2
[3,]    3    2
[4,]    4    2

$dist
     [,1] [,2]
[1,]    0    1
[2,]    0    0
[3,]    0    1
[4,]    0    0