uber / h3-py

Python bindings for H3, a hierarchical hexagonal geospatial indexing system
https://uber.github.io/h3-py
Apache License 2.0
829 stars 132 forks source link

Is it possible to make the Cython functions callable from Cython or Numba? Missing __pyx_capi__ attribute. #273

Open gwerbin opened 2 years ago

gwerbin commented 2 years ago

The Numba docs describe how to call Cython functions from Numba JIT-ed functions: https://numba.readthedocs.io/en/stable/extending/high-level.html#importing-cython-functions

I was hoping to try using this with h3._cy.geo_to_h3 in order to try to turn it into a ufunc using numba.vectorize. I see that there is ongoing discussion of "built-in" vectorization here (e.g. https://github.com/uber/h3-py/issues/237 and https://github.com/uber/h3-py/issues/205), but I wanted to try using Numba as a quick workaround, which would also support automatic parallelization and more of a complete ufunc-like interface.

Here is what I tried, following the example in the Numba docs:

import ctypes
import numba

_geo_to_h3_cyaddr = numba.extending.get_cython_function_address('h3._cy.geo', 'geo_to_h3')
_geo_to_h3_functype = ctypes.CFUNCTYPE(
    # Return type
    ctypes.c_uint64,
    # Parameter types
    ctypes.c_double,
    ctypes.c_double,
    ctypes.c_uint64,
)
geo_to_h3_cy = _geo_to_h3_functype(_geo_to_h3_cyaddr)

However, I got this error:

Traceback (most recent call last):
  File ".../tmp.py", line 4, in <module>
    _geo_to_h3_cyaddr = numba.extending.get_cython_function_address('h3._cy.geo', 'geo_to_h3')
  File ".../.conda/lib/python3.10/site-packages/numba/core/extending.py", line 464, in get_cython_function_address
    return _import_cython_function(module_name, function_name)
AttributeError: module 'h3._cy.geo' has no attribute '__pyx_capi__'

I did a bit of searching, and it appears that Cython needs to be coaxed into making this attribute available by compiling the Cython module in a particular way. It also appears that the absence of this attribute indicates that these functions can only be called from Python, but not from C.

I was going to try to recompile H3 with the necessary changes in place, but I saw that this project uses CMake, which I have no experience with.

Is it possible to adjust how H3-Py is compiled in order to make this attribute available, and make its Cython innards callable from other C code? Does anyone know how that might be accomplished?

Being able to easily "upgrade" the scalar Cython functions to Numba-vectorized ufuncs might be a reasonable way to defer or sidestep the questions around built-in vectorization.

ajfriend commented 2 years ago

This is something we'd like to allow users to be able to do for themselves, especially since different users might want vectorize functions differently. We haven't released the changes yet, but we did start the process with @kylebarron's work in https://github.com/uber/h3-py/pull/234. Have you tried what you're doing using the latest on the master branch?

Alternatively, you should be able to do this in Cython. Check out the example in the tests:

Let me know if this helps! I'd also love to have numba compatibility, but won't have a chance to look at that until after the v4.0 work is complete. But also happy to accept a PR if you do get it working in the meantime.

gwerbin commented 2 years ago

This is something we'd like to allow users to be able to do for themselves, especially since different users might want vectorize functions differently. We haven't released the changes yet, but we did start the process with @kylebarron's work in #234. Have you tried what you're doing using the latest on the master branch?

Alternatively, you should be able to do this in Cython. Check out the example in the tests:

* https://github.com/uber/h3-py/blob/master/.github/workflows/tests.yml#L49-L53

* https://github.com/uber/h3-py/blob/master/tests/test_cython.py

This looks exactly like what I (and Numba) needs to work. I will try it!

Let me know if this helps! I'd also love to have numba compatibility, but won't have a chance to look at that until after the v4.0 work is complete. But also happy to accept a PR if you do get it working in the meantime.

I believe the only Numba requirement is the requirement that the compiled module have a C interface, and I believe the __pyx_capi__ attribute is an indicator that the C interface has been successfully created.

I will leave this open until I have a chance to test it.

gwerbin commented 2 years ago

@ajfriend one quick note about test_cython.py. It seems you're using the Pytest test framework. Instead of simply return-ing from that test if the Cython extension isn't built, would it make more sense to explicitly mark the test as "skipped"? You can do so from inside the test using pytest.skip: https://docs.pytest.org/en/7.1.x/reference/reference.html#pytest.skip.

e-q commented 1 year ago

With the 4.0.0b2 release, I was thrilled to find that I can get Numba to call (and parallelize!) latlng_to_cell with the following:

from numba.extending import get_cython_function_address

latlng_to_cell_functype = ctypes.CFUNCTYPE(
    ctypes.c_uint64, ctypes.c_double, ctypes.c_double, ctypes.c_uint64
)
latlng_to_cell_addr = get_cython_function_address("h3._cy.geo", "latlng_to_cell")
latlng_to_cell_cy = latlng_to_cell_functype(latlng_to_cell_addr)

@numba.jit(nopython=True, nogil=True, parallel=True)
def parallelized(lat_arr, lon_arr, res):
    N = len(lat_arr)
    out_arr = np.empty(N, dtype="uint64")
    for ii in numba.prange(N):
        out_arr[ii] = latlng_to_cell_cy(lat_arr[ii], lon_arr[ii], res)
    return out_arr

This is a huge win for about half of my H3 usage.

I've been looking for a way to do the same with grid_ring/grid_disk, but it appears that Numba can only work with functions that return and accept straightforward ctypes types while the Cython implementation of the grid functions return memoryviews (i.e. H3int[:]). It looks like my dream of parallelized H3 computations without compiling anything myself is still out of reach 😅

gwerbin commented 1 year ago

@e-q I don't know if they cover the functionality you want specifically, but the H3ron Python bindings do offer some vectorized functionality that works with GeoPandas: https://github.com/nmandery/h3ronpy#convert-vectordata-to-h3-indexes