tchlux / balltree

A very fast python-wrapped fortran implementation of a `ball tree`, includes fast `sort` and `select` codes for flat arrays as well.
MIT License
5 stars 1 forks source link

How does this compare to scikit-learn BallTree? #2

Closed blaylockbk closed 2 years ago

blaylockbk commented 2 years ago

Just a quick question, and this seemed like the best place to ask it...

How is this different than BallTree in scikit-learn? Is this faster? https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.BallTree.html

tchlux commented 2 years ago

Running the test.py file on my machine gives these results (image). With 1 million points each having dimension 100, this code is nearly 10× faster to construct a tree and more than 50% faster to query exact neighbors.

I don't remember precisely how Vanderplas implemented the scikit ball tree, but I think the splitting metric was different. This code starts with a point on the convex hull, then does a binary split about the median based on 2-norm distance. This code also uses OpenMP to parallelize the tree construction process, where the scikit code is purely serial (last time I checked).

You can play with the size and dimension of the data and the number of nearest neighbors in the test.py file to see if it is faster on your hardware and for your application. Happy to help if you encounter issues!

image
blaylockbk commented 2 years ago

Thanks for the info!