Open thomasaarholt opened 3 years ago
@mdoijade can you take a look at this and see why kneighbors
is so slow?
@teju85 wouldn't the difference be because this is comparing different algorithms? i.e. brute
in GPU (which takes too long in CPU) vs kd_tree
/ball_tree
?
:smh: good catch @dantegd! I overlooked the array in line 14. Yes, this is indeed because the comparison is being made with brute-force against indexing methods. We don't yet have indexing based methods implemented.
Right, I should have made that clearer. Just in case this wasn't noticed either, I created #4020, pointing out that the other GPU algorithms crash python on my system.
@thomasaarholt, as @dantegd pointed out, the differences here are strictly related to the underlying algorithms. While it may increase training time, we do support the ivpq
algorithm, which should significantly minimize the query time at least.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
What is your question? Is the
NearestNeighbors.kneighbors
problem well-suited for GPU? I find it quite slow for big data on the GPU. I've ran the following on a single RTX2080TI and a 32-core AMD processor. The.fit()
method is very fast, but the .kneighbors() method is not. See below of some context and benchmarks. Note that in #4020 I show the other GPU algorithms crash the kernel.We can use:
to find the neighbors of
new_points
amongpoints
, if they both are arrays of coordinates (with shape(N, 2)
).While the creation and fitting of the model to the data is very quick with cuML, the
kneighbours
lookup is relatively slow when compared with sklearn's CPU implementation:Doing a simple with "medium" and "large" number of samples, I see the following results: