Open aleksficek opened 4 years ago
This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.
The overview by aleksficek above sums it all technically. kneighbors_graph
is much needed because KNN graphs have applications way beyond KNN classification or nearest neighbors queries. KNN graph are routinely constructed, for example, to cluster high dimensional single cell RNA sequencing datasets. this requires access to the full KNN graph.
In Scikit-learn and cuml, kneighbors_graph
returns the KNN graph that is needed to run, for example, community partitioning algorithms such as the Leiden algorithm. It is important to have this functionality in the dask version, because GPU memory drastically limits the size of the graphs that can be constructed with the single GPU version.
Overview Scikit-learn provides a
kneighbors_graph
feature that performs akneighbors
and returns a Sparse CSR matrix. This is being implemented in https://github.com/rapidsai/cuml/pull/2461 but to have this in the Dask layer, cupy-backed sparse arrays need to have desired functionality completed. This functionality is being completed as part of https://github.com/cupy/cupy/pull/3486 whichkneighbors_graph
in Dask depends so the new functionality must be merged into CuPy before hand (ETA: early August).Additional context
kneighbors_graph
should follow the Scikit-learn API as closely as possible, this involves implementing the function so that it can create aNearestNeighbors
internally before thekneighbors
call (https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.kneighbors_graph.html) and so that it can be performed on top a previously instantiatedNearestNeighors
instance (https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html#sklearn.neighbors.NearestNeighbors)KNeighborsRegressor
andKNeighborsClassifier
dask.Array
interface (https://docs.dask.org/en/latest/array-sparse.html)