I did some preliminary tests and naive benchmarks (random points) against scikit-learn's BallTree index (with Haversine metric), and the results are very promising! It might be worth to also compare it against some fast KD-Tree implementations available in Python (with appropriate coordinate system transformation).
A nice thing with s2geometry is that we could leverage additional features like:
setting a maximum lookup distance to speed-up queries (related to #11 and #13).
range queries, i.e., select all points within a latitude and/or longitude range (which would work well with xarray.Dataset.sel(lat=slice(...), lon=slice(...))) ... or actually within a region of any geometry (polygon).
Build index benchmark: pys2index.S2PointIndex 3x faster than sklearn.neighbors.BallTree for 10M points.
Query index benchmark: pys2index.S2PointIndex 40x faster than sklearn.neighbors.BallTree for 100k points!
I started working on some Python/NumPy wrappers for s2geometry indexes here: https://github.com/benbovy/pys2index
I did some preliminary tests and naive benchmarks (random points) against scikit-learn's
BallTree
index (with Haversine metric), and the results are very promising! It might be worth to also compare it against some fast KD-Tree implementations available in Python (with appropriate coordinate system transformation).A nice thing with s2geometry is that we could leverage additional features like:
xarray.Dataset.sel(lat=slice(...), lon=slice(...))
) ... or actually within a region of any geometry (polygon).Build index benchmark:
pys2index.S2PointIndex
3x faster thansklearn.neighbors.BallTree
for 10M points.Query index benchmark:
pys2index.S2PointIndex
40x faster thansklearn.neighbors.BallTree
for 100k points!Notebook link: https://gist.github.com/benbovy/1f35c717b44791ca600655081b5b6fc3