Open cjnolet opened 5 years ago
This sounds the same as kNN!? Not sure I got you here @cjnolet. Can you elaborate, please?
Rather than it returning the closest neighbors, it is only returning the neighbors that are within a specific distance. For instance, we are doing this on DBSCAN already w/ the eps
parameters (basically filtering the pairwise distance matrix based on some epsilon).
https://en.wikipedia.org/wiki/Neighbourhood_(mathematics)
You can accomplish this with the k-nn distance matrix if you set k = n
, for instance, but it's something that, to my knowledge, does not exist yet (in reusable form) in cuml.
Here's the sklearn API for NearestNeighbors (which is the sklearn equivalent for cuml's current kNN implementation): https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html#sklearn.neighbors.NearestNeighbors
Notice the methods kneighbors()
and radius_neighbors()
.
ah.. got it. Thanks Corey for the explanation.
We now have a prim for building adjacency matrices out of epsilon neighborhoods:
size_t epsilon_neighborhood(T *a, T *b, bool *adj, int m, int n, int k, T eps,
void *workspace, size_t worksize, cudaStream_t stream)
DBSCAN is already doing this, as it uses the eps value to determine the the edges of the adjacency graph. It should not be too hard to perform the same batching that DBSCAN is performing, whilst thresholding the distance matrix and returning the resulting vectors.