Open gwerbin opened 2 years ago
This is something we'd like to allow users to be able to do for themselves, especially since different users might want vectorize functions differently. We haven't released the changes yet, but we did start the process with @kylebarron's work in https://github.com/uber/h3-py/pull/234. Have you tried what you're doing using the latest on the master
branch?
Alternatively, you should be able to do this in Cython. Check out the example in the tests:
Let me know if this helps! I'd also love to have numba
compatibility, but won't have a chance to look at that until after the v4.0 work is complete. But also happy to accept a PR if you do get it working in the meantime.
This is something we'd like to allow users to be able to do for themselves, especially since different users might want vectorize functions differently. We haven't released the changes yet, but we did start the process with @kylebarron's work in #234. Have you tried what you're doing using the latest on the
master
branch?Alternatively, you should be able to do this in Cython. Check out the example in the tests:
* https://github.com/uber/h3-py/blob/master/.github/workflows/tests.yml#L49-L53 * https://github.com/uber/h3-py/blob/master/tests/test_cython.py
This looks exactly like what I (and Numba) needs to work. I will try it!
Let me know if this helps! I'd also love to have
numba
compatibility, but won't have a chance to look at that until after the v4.0 work is complete. But also happy to accept a PR if you do get it working in the meantime.
I believe the only Numba requirement is the requirement that the compiled module have a C interface, and I believe the __pyx_capi__
attribute is an indicator that the C interface has been successfully created.
I will leave this open until I have a chance to test it.
@ajfriend one quick note about test_cython.py
. It seems you're using the Pytest test framework. Instead of simply return
-ing from that test if the Cython extension isn't built, would it make more sense to explicitly mark the test as "skipped"? You can do so from inside the test using pytest.skip
: https://docs.pytest.org/en/7.1.x/reference/reference.html#pytest.skip.
With the 4.0.0b2
release, I was thrilled to find that I can get Numba to call (and parallelize!) latlng_to_cell
with the following:
from numba.extending import get_cython_function_address
latlng_to_cell_functype = ctypes.CFUNCTYPE(
ctypes.c_uint64, ctypes.c_double, ctypes.c_double, ctypes.c_uint64
)
latlng_to_cell_addr = get_cython_function_address("h3._cy.geo", "latlng_to_cell")
latlng_to_cell_cy = latlng_to_cell_functype(latlng_to_cell_addr)
@numba.jit(nopython=True, nogil=True, parallel=True)
def parallelized(lat_arr, lon_arr, res):
N = len(lat_arr)
out_arr = np.empty(N, dtype="uint64")
for ii in numba.prange(N):
out_arr[ii] = latlng_to_cell_cy(lat_arr[ii], lon_arr[ii], res)
return out_arr
This is a huge win for about half of my H3 usage.
I've been looking for a way to do the same with grid_ring
/grid_disk
, but it appears that Numba can only work with functions that return and accept straightforward ctypes
types while the Cython implementation of the grid functions return memoryviews (i.e. H3int[:]
). It looks like my dream of parallelized H3 computations without compiling anything myself is still out of reach 😅
@e-q I don't know if they cover the functionality you want specifically, but the H3ron Python bindings do offer some vectorized functionality that works with GeoPandas: https://github.com/nmandery/h3ronpy#convert-vectordata-to-h3-indexes
The Numba docs describe how to call Cython functions from Numba JIT-ed functions: https://numba.readthedocs.io/en/stable/extending/high-level.html#importing-cython-functions
I was hoping to try using this with
h3._cy.geo_to_h3
in order to try to turn it into a ufunc usingnumba.vectorize
. I see that there is ongoing discussion of "built-in" vectorization here (e.g. https://github.com/uber/h3-py/issues/237 and https://github.com/uber/h3-py/issues/205), but I wanted to try using Numba as a quick workaround, which would also support automatic parallelization and more of a complete ufunc-like interface.Here is what I tried, following the example in the Numba docs:
However, I got this error:
I did a bit of searching, and it appears that Cython needs to be coaxed into making this attribute available by compiling the Cython module in a particular way. It also appears that the absence of this attribute indicates that these functions can only be called from Python, but not from C.
I was going to try to recompile H3 with the necessary changes in place, but I saw that this project uses CMake, which I have no experience with.
Is it possible to adjust how H3-Py is compiled in order to make this attribute available, and make its Cython innards callable from other C code? Does anyone know how that might be accomplished?
Being able to easily "upgrade" the scalar Cython functions to Numba-vectorized ufuncs might be a reasonable way to defer or sidestep the questions around built-in vectorization.