Open alekseynp opened 3 years ago
For CPU, we are currently using scipy.cluster.vq
, while you can also make use of scipy.spatial.KDTree
to maintain the information about the kd-tree. Adding kd-tree CUDA support is something that I always wanted to add (especially for knn
and ball query graph constructions), but haven't had yet the resources and knowledge to implement it efficiently by myself. Any pointers/help are highly appreciated.
Also pinging @mrjel here who promised me some time ago to look into this ;)
I eventually found this project: https://github.com/lxxue/FRNN The pre-computed data structure is a voxel grid and the api allows to hold and re-use it next call. The caveat is that the implementation is FRNN not NN. "Fixed Radius Nearest Neighbour".
The project is in a good state in that I was able to pull it straight off the shelf and use it to accelerate my training.
I haven't spoken to the author, but I don't think the project is in a state where it will be well maintained. The code is definitely non-trivial because it is doing an advanced thing, and it would be much more complex to understand and maintain than, say, the existing NN implementation in pytorch_cluster which is very straightforward.
I don't have enough experience with open source software or the pytorch geometric project to judge, but the obvious question from my perspective is: Should torch_cluster have a fixed_radius_nearest
function based on the FRNN implementation?
Thanks for the pointer. Pinging @mrjel here as well as he always shared interest in adding this feature to torch-cluster
:)
This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?
In my work I find myself making frequent cuda calls to
torch_cluster.nearest
with the formnearest(different_every_time, same_every_time)
without providing abatch_x
orbatch_y
.different_every_time
is on the order of say(40000, 3)
andsame_every_time
is(2000, 3)
. If this could be accelerated an order of magnitude, that would have significant value to me.Any suggestions? Does anyone else find themselves in a similar situation? Do they have a solution? Would a solution have significant value to the community?
I assume that the strategy would be to pre-compute some kind of tree data structure, and then provide that
nearest_with_tree(different_every_time, precomputed_tree_structure)
On CPU I guess this would be a kd-tree and it would be orders and orders of magnitude faster than computing the 40000 * 2000 pairwise distances. On CUDA, I think the currenttorch_cluster.nearest
is computing all of those distances, but of course the parallelisation and memory access patterns on CUDA might change the game in terms of gains with a tree structure?