hpyproject / hpy

HPy: a better API for Python
https://hpyproject.org
MIT License
1.02k stars 52 forks source link

Exposing APIs from HPy extensions to be used by other HPy extensions #446

Open steve-s opened 11 months ago

steve-s commented 11 months ago

The motivating example is the NumPy API that is exposed to other Python extensions such that they can work with arrays natively/directly without a round-trip through Python code/abstractions.

How the NumPy API works at the moment:

The very same scheme can work with HPy, but has one drawback: the 3rd party extension gets some HPyContext and passes it to NumPy, which means:

Are those restrictions problematic enough to seek a better solution?

One possibility is to provide some way to "wrap" function pointers with a trampoline that can "transform" the HPyContext to another if necessary. Example in code:

// NumPy:
HPy my_api_function(HPyContext *ctx, HPy h) { ... }
// ...
numpy_api_capsule->my_api_function_pointer = HPy_AsAPI(ctx, &my_api_function);

// 3rd party using the API to call the function:
numpy_api_capsule->my_api_function_pointer(my_hpy_context, my_handle);

// HPy universal implementation of the generated trampoline would be:

HPy_API_token numpy_token; // implementation specific: 
// a pointer to anything the implementation needs, initialized in the HPy_AsAPI call

HPy my_api_function_trampoline(HPyContext *caller_ctx, HPy h) {
    HPyContext *numpyCtx = _HPy_TransformContext(caller_ctx, numpy_token); // part of ABI, not API
    my_api_function(numpyCtx, h);
}

Question is how to generate the trampoline. We can use macros for that, something like HPy_APIDef(...). As a bonus we could generate CPython API trampolines, so that the API can be usable from non-HPy packages (NumPy would have to expose another capsule with the CPython trampolines to be used by non-HPy packages).

fangerer commented 11 months ago

Are those restrictions problematic enough to seek a better solution?

IMO, we definitively need some interception. I can add following point:

It may be the case that it is fine to pass the HPyContext to the next module but I think we shouldn't assume that in general.

One possibility is to provide some way to "wrap" function pointers with a trampoline that can "transform" the HPyContext to another if necessary.

Sounds good to me. I'm just not so sure about this:

numpy_api_capsule->my_api_function_pointer = HPy_AsAPI(ctx, &my_api_function);

Would HPy_AsAPI return the function pointer of the trampoline (i.e. my_api_function_trampoline in the above example)? If so, a macro like the suggested HPy_APIDef would certainly generate some kind of definition (just like HPyDef_METH or similar) and we would pass the definition to HPy_AsAPI.

steve-s commented 11 months ago

Would HPy_AsAPI return the function pointer of the trampoline (i.e. my_api_function_trampoline in the above example)? If so, a macro like the suggested HPy_APIDef would certainly generate some kind of definition (just like HPyDef_METH or similar) and we would pass the definition to HPy_AsAPI.

Good point. Yes, we should probably do the exactly same thing as with HPyDef_METH -- it would generate a struct and one would pass that to HPy_AsAPI, or maybe HPy_GetAPI.

steve-s commented 11 months ago

Packages from top4000 with string "PyArrayObject" in their sources:

asammdf astropy Bottleneck cvxpy dedupe ecos fastcluster GDAL matplotlib numba numexpr numpy opencv osqp pandas pyerfa python scipy scs shap Theano

Do we know of any other package that exposes some C API? I looked at pandas, they don't have it. What is NumPy's take on its C API: should people be ideally using the memory view and other generic means over the NumPy's C API? If that was the case, we could also say that exposing own C APIs is something that should not be done and hence is not supported in HPy.

fangerer commented 11 months ago

What is NumPy's take on its C API: should people be ideally using the memory view and other generic means over the NumPy's C API?

I would assume that since there is the array API and NumPy implements it (https://numpy.org/doc/stable/reference/c-api/array.html), NumPy's take is not necessarily to use memory view. But I don't know.

steve-s commented 11 months ago

Isn't that API on the Python level?

mattip commented 11 months ago

It would be nice if people used the dlpack interface, which provides a standard way to interacts with array-like objects. But thinking about this more deeply it seems that if the HPy port of NumPy must export some kind of C-API, it would still have to be able to export exactly the CPython PyArrayObject. Refactoring code like this from matplotlib to avoid the NumPy C-API (with PyArrayObject) is not going to be easy, it would require replacing their numpy::array_view c++ class with something else, or at least rethinking all the incref/decref in that class.

So if we are confined to use PyArrayObject, can we export that from an HPy port of NumPy without using legacy mode?

mattip commented 11 months ago

Note all the dlpack interface requires is capsule support, which HPy has.

TeamSpen210 commented 11 months ago

Cython does also contain a system for exposing your types/functions as API, via automatic capsule use. But it also has internal shared code capabilities. If you import multiple Cython modules (transpiled with the same version), they'll share the implementation of the custom function type, things like that.

steve-s commented 3 weeks ago

Related discussion: https://discuss.python.org/t/changing-the-pycapsule-api-to-better-support-versions/54860