hpyproject / hpy

HPy: a better API for Python
https://hpyproject.org
MIT License
1.02k stars 52 forks source link

Compatibility with non-C languages #15

Open filmor opened 4 years ago

filmor commented 4 years ago

It would be great if the API could (at least as an alternative) stick to "normal" functions, i.e. no vararg, no macros and no global variables.

In particular, there should at least be an alternative to HPyArg_Parse with a signature like

int HPyArg_Parse2(HPyContext ctx, HPy *args, HPy_ssize_t nargs,
             const char *fmt, void** targets);

Also, there could be a set of functions to get the constants:

HPy HPyConst_None();

Lastly, there doesn't seem to be a way to define modules and methods "programmatically" right now. Even if this is less efficient, it would nevertheless help a lot.

The reason I'm asking for these things is my involvement with pythonnet, which currently has to translate the C macros manually to C#, with the obvious possibility of breakage down the line and an annoying version-dependence. Also, .NET's P/Invoke mechanism (like, I guess, many FFI implementations) only really supports access to functions, and only to those without varargs. Varargs have the addtional annoying property, that they are ABI-dependent, so we'd have to model that per platform as well.

vstinner commented 4 years ago

Hi,

Le mer. 22 janv. 2020 à 10:06, Benedikt Reinartz notifications@github.com a écrit :

It would be great if the API could (at least as an alternative) stick to "normal" functions, i.e. no vararg, no macros and no global variables.

FYI On the Python C API, we started to stop adding new macros and sometimes even provide a new function with there was only a macro. For example, I changed my PEP 587 (PyConfig) implementation to replace macros with only functions.

Global variables is another story. CPython inherits 30 years history, and moving away from globals will take a few years. We (CPython) are aware of that and there is a group of people working on that. See for example my recent article: https://vstinner.github.io/cpython-pass-tstate.html

The idea is move away from "implicit state" to "passing explicitly a state" (Python thread state in practice: "tstate").

In particular, there should at least be an alternative to HPyArg_Parse with a signature like

int HPyArg_Parse2(HPyContext ctx, HPy args, HPy_ssize_t nargs, const char fmt, void** targets);

How do you use this API? Can you give an example?

Also, there could be a set of functions to get the constants:

HPy HPyConst_None();

FYI I'm also considering to add functions to Python to get "singletons": None, True, False for example. I'm not sure if they should be called "GetNone" or just "None".

I suggest: HPy_GetNone().

Currently in Python, these singletons are shared by all interpreters which is a performance bottleneck. Py_INCREF(Py_None) requires locking. Whereas if we have one None per interpreter, you can do "none = Py_GetNone()" in parallel: no lock should be needed (I mean, two interpreters can run in parallel, but I don't think that we can run two threads of the same interpreter in parallel, at least, not now).

Well, that's the long term plan for CPython subinterpreters. But it's good to prepare the API for such use case ;-)

Lastly, there doesn't seem to be a way to define modules and methods "programmatically" right now. Even if this is less efficient, it would nevertheless help a lot.

Python provides "struct PyModuleDef" and PyModule_Create(): PEP 384.

Would you mind to elaborate what you would need?

Victor -- Night gathers, and now my watch begins. It shall not end until my death.

arigo commented 4 years ago

Hi,

On Wed, 22 Jan 2020 at 11:47, Victor Stinner notifications@github.com wrote:

Le mer. 22 janv. 2020 à 10:06, Benedikt Reinartz notifications@github.com a écrit :

It would be great if the API could (at least as an alternative) stick to "normal" functions, i.e. no vararg, no macros and no global variables. (...) Also, there could be a set of functions to get the constants:

HPy HPyConst_None();

FYI I'm also considering to add functions to Python to get "singletons": None, True, False for example. I'm not sure if they should be called "GetNone" or just "None".

I suggest: HPy_GetNone().

Did you look at https://github.com/pyhandle/hpy/blob/master/hpy-api/hpy_devel/include/universal/autogen_trampolines.h and https://github.com/pyhandle/hpy/blob/master/hpy-api/hpy_devel/include/universal/autogen_ctx.h ? It might be a better idea to make concrete proposals rather than discussing ideas all over again.

In the core API (autogen_ctx.h) there are no vararg, macros or global variables. For regular C code, there is then a thin auto-generated layer (autogen_trampolines.h).

A bientôt,

Armin.

antocuni commented 4 years ago

To elaborate Armin's answer; currently HPy is designed around two concepts:

However, it is true that the signature of ctx_Arg_Parse could be improved, maybe: currently it takes an argument of type va_list, which I fear it is compiler-specific? Is it even possible to have a "standard" signature like that?

filmor commented 4 years ago

It's been a while, but let me answer/comment:

FYI On the Python C API, we started to stop adding new macros and sometimes even provide a new function with there was only a macro. For example, I changed my PEP 587 (PyConfig) implementation to replace macros with only functions.

This is all fine and good, I was just suggesting doing that for this new API as well. I don't think I saw any macros, I just wanted to make it clear that while all of these (macros, varargs, global variables) are kind of okay for a C API, they are pretty much useless for FFI. The same actually holds for suggestions to access the context struct directly (like ctx->h_None). This is of course possible via some sort of pointer magic, but hugely more work to get right than just keeping a list of functions and (opaque) types around as alignment etc. need to be carried out on the caller site.

Regarding usage of a hypothetical HPyArg_Parse2:

int HPyArg_Parse2(HPyContext ctx, HPy *args, HPy_ssize_t nargs,
         const char *fmt, void** targets);
 // ...
void*[] targets = { NULL, NULL };
HPyArg_Parse2(ctx, args, 2, "ll", targets);
void* a = targets[0];
void* b = targets[1];

These are mechanisms that are readily available in an FFI situation, unless I'm missing something important :)

* an universal ABI: this is what the various implementations need to care about: so far, we tried to keep things as straightforward as possible, which means: no varargs, no global variables, no linking to external symbols. All the state is passed around in the `ctx`. So, for `pythonnet` you should look at `autogen_ctx.h`, as Armin pointed out.

The only things in there that I would really /need/ as far as I can see are the h_* members (like h_None etc.) for which the library could just make a function available, like the mentioned HPy_GetNone(ctx).

antocuni commented 4 years ago

I agree that HPy should be usable also without macros/varargs, etc.

The same actually holds for suggestions to access the context struct directly (like ctx->h_None). Note that in the current design functions such as HPy_GetNone are not exported as symbols. Instead, they are translated (using a macro) into things such as ctx->ctx_GetNone(...).

So, I don't see the difference between accessing ctx->h_None and calling ctx->ctx_GetNone(). In both cases, you need to know the offset inside the struct.

Regarding usage of a hypothetical HPyArg_Parse2:

int HPyArg_Parse2(HPyContext ctx, HPy *args, HPy_ssize_t nargs,
         const char *fmt, void** targets);
 // ...
void*[] targets = { NULL, NULL };
HPyArg_Parse2(ctx, args, 2, "ll", targets);
void* a = targets[0];
void* b = targets[1];

I don't understand how it is supposed to work. If you pass "ll", you probably want to put the result into two local variables long a, b. Also, what does targets contain after the call? It can't be the addresses of the result, because nobody allocated it (unless you are proposing for HPyArg_Parse2 to allocate memory for each output argument).

It might be easier to discuss these topics on IRC though: feel free to join #hpy on freenode if you have questions :)

filmor commented 4 years ago

So, I don't see the difference between accessing ctx->h_None and calling ctx->ctx_GetNone(). In both cases, you need to know the offset inside the struct.

Yeah, this was confusing. What I meant (and didn't write, because I was staring at that header) was a global symbol HPy_GetNone(ctx) (as you suggested before.

I don't understand how it is supposed to work. If you pass "ll", you probably want to put the result into two local variables long a, b.

Ah, I misunderstood how the function currently works. Then it would be

long a, b;
void*[] targets = {&a, &b};

and it would write to a and b through the passed pointers.

antocuni commented 4 years ago

Yeah, this was confusing. What I meant (and didn't write, because I was staring at that header) was a global symbol HPy_GetNone(ctx) (as you suggested before.

I think you misunderstand how the HPy universal mode is designed: there are NO global exported symbols at all. For example, look at the implementation of HPy_Dup:

static inline HPy HPy_Dup(HPyContext ctx, HPy h) {
     return ctx->ctx_Dup ( ctx, h ); 
}

HPy_Dup is just a small trampoline which jumps to ctx->ctx_Dup, which is a function pointer provided by the interpreter which contains the actual logic. The offset of ctx_Dup inside HPyContext is fixed and is part of the ABI. So from this point of view, calling an hypotetical HPy_GetNone means calling ctx->GetNone, which is as hard (or as easy) as getting ctx->h_None.

Sorry if this does not answer your question, but maybe it is entirely possible that I don't understand the question/problem at all :). Going back to your first message;

The reason I'm asking for these things is my involvement with pythonnet, which currently has to translate the C macros manually to C#, with the obvious possibility of breakage down the line and an annoying version-dependence.

What are the C macros that pythonnet needs to translate? Also, what does it do exactly? Does it produce a .so file which is loaded by Python, or does it embed libpython and makes calls at runtime? Could you provide a concrete (but hopefully simplified) example which shows why the current HPy design would be problematic?

filmor commented 4 years ago

HPy_Dup is just a small trampoline which jumps to ctx->ctx_Dup, which is a function pointer provided by the interpreter which contains the actual logic. The offset of ctx_Dup inside HPyContext is fixed and is part of the ABI. So from this point of view, calling an hypotetical HPy_GetNone means calling ctx->GetNone, which is as hard (or as easy) as getting ctx->h_None.

I indeed didn't understand that the offsets are part of the ABI. They are still compiler-dependent, aren't they? It's just a lot simpler in FFI to get function calls working correctly and fast than struct access + interpreting the function pointer + calling it with the right calling convention, at least that has been my experience so far with .NET's P/Invoke.

What are the C macros that pythonnet needs to translate? Also, what does it do exactly? Does it produce a .so file which is loaded by Python, or does it embed libpython and makes calls at runtime?

Both, actually. We allow embedding a .NET runtime in Python as well as running Python from within .NET. FFI-wise it's mostly the same.

Examples of macros we have to "copy" are:

It's very well possible that this is all solved in newer versions of CPython, but that is also not what I want to discuss here :)

hodgestar commented 3 years ago

@filmor How do you generate your FFI calls when calling a function? Do you start from a C header? Or some other way?

It might be easiest to discuss this on a dev call rather than very slowly here. They happen on the first Thursday of each month at roughly 9am GMT. The invite is usually posted to the mailing list a day or so before the call.

filmor commented 3 years ago

I'll see whether I can join next month, won't be able today. We use the .NET equivalent of dlsym to get the function pointers and manually hard-code the signatures. We could theoretically generate these from (simple) headers with a bit of work, that's what we currently do for some more advanced features to get the slot offsets within PyObject, but since ABI-compat is an explicit goal of hpy, we should be able to get away with the manual translation.

hodgestar commented 3 years ago

@filmor Next month would be great. Perhaps one could even attempt to autogenerate what you need by using pieces of HPy's autogeneration framework (which we use to generate a lot of the boilerplate HPy code from public_api.h). If you think this would be useful, shout and I can try give pointers. Our autogeneration code could definitely to will some tidying up (it grew organically) unfortunately.