xarray-contrib / xoak

xarray extension that provides tree-based indexes used for selecting irregular, n-dimensional data.
https://xoak.readthedocs.io
MIT License
57 stars 4 forks source link

Nearest neighbor lookup using H3 at a 'fixed' resolution #16

Open benbovy opened 4 years ago

benbovy commented 4 years ago

I'm wondering if there could be more efficient alternatives to (K-D / Ball / R) trees for the cases where the (lat, lon) data points to be indexed are not strictly evenly spaced but where the distances between direct neighbors are still pretty much similar for the whole dataset. (Is it the case for NEMO and/or FASEOM2 model grids?)

There are some examples here and here on performing spatial search using the H3 library.

Here, the basic idea would be:

  1. Choose a fixed H3 resolution res
  1. Build the index:
  1. Nearest neighbor query:

Whether the query is efficient or not will depend of res. Ideally, there should be only a handful of candidates in the direct H3 cell vicinity (kRing=1) for each query point.

benbovy commented 4 years ago

Build a hash-table so that we can retrieve the original data points (positional index) from the computed H3 index values. Not sure at all about this part, though. Would this be efficient? The size of the table could be potentially huge. I guess numba's support for dict would be useful here?

Some naive benchmark results using numba's typed Dict within a jitclass:

benbovy commented 4 years ago

One limitation, though: Numba's typed.Dict currently doesn't support mutable lists as values.

willirath commented 4 years ago

Re fixed resolution

Typical NEMO grids vary in resolution. The example data from the global (nominal 0.5deg) climate model looks like this (shown is (e1t**2 + e2t**2)**0.5):

image

That's a factor of 5 in one dim and hence ~25 of the smallest grid cells fitting into the biggest grid cells.

benbovy commented 4 years ago

Oh yes that makes sense.

We could somehow leverage H3 cells at multiple levels (resolutions) for that case...

fhk commented 3 years ago

was there any more research done on h3 indexes?

benbovy commented 3 years ago

Not on my side. There's an example of nearest-neighbor search in https://github.com/joaofig/geo-spoke, although I'm not sure that it is fully vectorized (queries are for one point location at a time). H3's Python bindings still has a limited number of functions that are vectorized (available under the unstable namespace), so I'm afraid it would be hard to come with an efficient and xoak-friendly implementation written in pure-Python.

I've chosen to go with S2Geometry instead (#17) as it as more built-in features like a point index based on a binary tree. I still had to write custom, vectorized Python bindings for it, though.

benbovy commented 2 weeks ago

This issue may be rather addressed in https://github.com/xarray-contrib/xdggs.

fhk commented 2 weeks ago

Wow @benbovy thanks for the follow up!