antocuni commented 3 years ago

This is an attempt to start a discussion on how to implement function calls and function definitions. In the following, "functions" and "methods" are used interchangeably.

First, we need to identify who are the interested players and use cases:

authors of C extensions who write C functions. Let's call them C-writers.
Authors of C extensions who call Python functions. Let's call them C-callers.
Generators of C extensions. Let's call them "cython" but it applies also to others.
Python implementations with a JIT. Let's call them "pypy", but it applies also to others.

The basic idea is that we want to allow C-writers to write functions with a C-level signature, and generate the logic to do argument parsing automatically. E.g.:

long my_add_impl(long x, long y) { return x+y; };
AUTOMAGICALLY_GENERATE_PYTHON_FUNCTION("my_add", my_add_impl);

The precise details of how to implement AUTOMAGICALLY_GENERATE_PYTHON_FUNCTION are the scope of this discussion.

Goals & constraints

Writing functions in this way is optional. It will always be possible to write functions with the usual calling conventions such as HPyFunc_VARARGS and to the argument parsing manually.
There must be a way to quickly check whether a Python callable supports a given C-level signature and to get the underlying function pointer. "pypy" can use this to generate code which completely bypasses the python argument parsing logic, and cython" could use it to emit a fast-path in case it statically know the C types of the arguments.
Ideally, this should be integrated with the C API to call functions. E.g., if a C-caller calls HPy_Call(my_add, "ll", 4, 5), an implementation should be able to bypass argument parsing and call my_add_impl directly.
Bonus point if we find a way to implement goal 3 also on CPython.
It should be possible to do things manually: i.e., a C-writer could write its own argument parsing code and be able to declare a C-level signature. In that case, he needs to ensure that its own argument parsing code does the "correct things". In particular, when you do call from Python like my_add(4, 5), the implementation should be free to decide whether to call the generic version or the C-specialized overload.
To be discussed: we need to decide whether we want to support "overloads" or not. I.e., a given Python functions could in principle support many different C-level signatures. I think that all the following options are reasonable:
- support at most one C-level signature. Functions which can't be encoded this way needs to be written in the "old style" and parse arguments manually.
- Support at most N C-level signatures, where N is a small number like 4. If you require more than N, you have to write parsing manually.
- Support a potentially unlimited number of C-level signatures.

Argument Clinic vs C macros

CPython has already something similar: it is called Argument Clinic and AFAIK it's used only internally. You have to write special comments in the C code to specify the signature of your function, then you run a python script which edits the C files and adds the relevant autogenerated code. In the following, when I talk about "Argument Clinic" I don't necessarily mean the very same clinic.py as CPython. We will probably have to write our own version, but the concept is the same.

On the other hand, HPy so far has relied on macros to generate pieces of code. Consider the following example:

HPyDef_METH(double_obj, "double", double_obj_impl, HPyFunc_O)
static HPy double_obj_impl(HPyContext ctx, HPy self, HPy obj)
{
    return HPy_Add(ctx, obj, obj);
}

Here, HPyDef_METH is a macro which among the other things generates a small trampoline to convert from the CPython calling convention to the HPy calling convention. It generates something similar to this (in the CPython-ABI case):

static PyObject *
double_obj(PyObject *self, PyObject *arg)
{
    return _h2py(double_obj_impl(_HPyGetContext(), _py2h(self), _py2h(arg)));
}

So one option is to extend this functionality to generate also the argument parsing logic (more on that later).

Both options have pros and cons, IMHO:

Clinic PRO: is already known by CPython devs and it seems to work well.
Clinic PRO (very futuristic): it will be easier for CPython itself to migrate to HPy.
Clinic CON: C-writers might dislike the fact that an external script modifies their C source, potentially cluttering them with lines and lines of obscure code. It might be possible to put all the generated code into a separate file though.
Macros PRO: they just work with any C compiler. The C code which you have to write is probably more compact and/or nicer to read.
Macros CON: probably we will be more limited in the complexity of logic we can generate (maybe it's a PRO :)).
Macros CON: compiler erros are potentially more obscure, although so far in HPy we managed to get very good compiler errors in response to common mistakes, at least with gcc.
Macros CON: we need to put an upper bound to the number of arguments, because we need to autogen a file which contains the macros for all possible C signatures. We also need to check whether this impacts compilation time negatively.

How to encode C signatures

There are at least two ways to encode/specify C signatures:

use a C string: this is more or less equivalent to what you can pass to HPyArg_Parse, although we need to extend the notation to specify the return type. E.g., "d ll" could mean double _(long, long).
extend the enum HPyFunc_Signature to support many more signatures. So, in addition to HPyFunc_VARARGS, HPyFunc_NOARGS etc., you have e.g. HPyFunc_d_ll which corresponds to double _(long, long). In this scenario each signature is represented by a single int64_t value, and there are at least a couple of variations for how to encode it:
- if we decide that we are happy to support only long, double, HPy and void, we can encode a single type in 2 bits. So, we can specify signatures up to 31 arguments (2 bits are reserved for the return type), maybe a bit less if we want to save some bits to encode other interesting features (e.g. if the function supports varargs or keywords).
- Use 8 bits for each param: with this we can support many more types, but we are limited to signatures up to ~7 arguments, maybe 6 if we want to reserve some bits for other features. 6-7 arguments are enough to cover the vast majority of functions though: if a function wants to use more args, it has to do argument parsing by itself.

Pros/cons of each approach:

C string PRO: easy to understand, very flexible.
C string CON: it's impossible to do any compile-time type check.
C string CON: I think it's impossible to implement it with macros. The current approach of using HPyFunc_* works because we can write "specialized" versions of HPyFunc_TRAMPOLINE for each possible value of HPyFunc_Signature, but I have no clue how to do it with a generic C string using macros. So, if we choose this, we are automatically choosing argument clinic.
HPyFunc PRO: checking whether a callable supports a given signature is very quick, since you just compare two ints. Do the same check with strings is probably slower because you need a strcmp.
HPyFunc PRO: works with the macros approach.
HPyFunc PRO: we can probably write something which does compile-time checks of the argument types.
HPyFunc CON: much less flexible that C strings. The C syntax for calling is probably also less nice, e.g. HPy_Call(HPyFunc_d_ll, 4, 5) vs HPy_Call("ll", 4, 5).
HPyFunc CON: if we use the macros approach, we probably need to generate a huge header with all the macros definitions, which might impact the build time.

We can also adopt a hybrid approach: the user-facing API takes and receives C strings for signatures, but internally we represent them inside an encoded int64_t. This should make runtime signature checks faster (but it's probably a good idea to do some benchmarks).

Return types

Another open question is what to do with return types. Consider the example above in which I have the function my_add whose signature is "d ll" (i.e., double _(long, long)):

HPy res = HPy_Call("ll", my_add, 4, 5);

in this case, the return type is HPy. But what if I want to call the C function directly and the a double back, without having to box it? Cython surely needs an API to do that. So maybe something like this:

double result;
void *fnptr = HPyFunc_Try("d ll", my_add);
if (fnptr)
    result = fnptr(4, 5);
else
    result = HPyFloat_AsDouble(HPy_Call("ll", my_add, 4, 5));

Suggestions for a better name instead of HPyFunc_Try are welcome.

Runtime signature checks

Why do we need fast runtime signature checks? I can think of at least two use cases:

HPy_Call: you can add a fast-path: if the callable supports the given signature, you call it directly, bypassing the boxing/unboxing. But it is unclear whether it is doable since HPy_Call knows only the types of the arguments, not the type of the result.
HPyFunc_Try: see above, this is needed by Cython.

Note that this doesn't apply to "pypy": assuming that the callee is known, the JIT can do the signature check at compile time, so it doesn't have to be particularly efficient.

Relationship with the current `HPyFunc_*`

Currently the enum HPyFunc_Signature defines ~30 signatures which are used by methods and slots. We need to understand whether the represent the same thing as the C-level function signatures or whether they are completely different beasts. E.g., HPyFunc_O is basically equivalent to "O O", HPyFunc_BINARYFUNC to "O OO", HPyFunc_INQUIRY to "i O", etc.

Proposal #1: Argument clinic and C strings

Let's try to turn this into something more concrete. At the moment, I am not happy with eiher of those though. The following is a sketch proposal which uses the "Argument clinic" and "C string" approaches described above:

/*[hpy-clinic input]
my_add

  return: "d"
  a: "l"
  b: "l"

add two numbers together
 [hpy-clinic start generated code]*/
... code generated by hpy-clinic ...
/*[hpy-clinic end generated code]*/

static double my_add_impl(HPyContext ctx, long a, long b)
{
    return (double)(a+b);
}

What I don't like too much of this approach is that it's completely different that the HPyDef_METH that you use for non-argument clinic methods. Maybe something like this, in which we put the generated code BEFORE the call to HPyDef_METH_CLINIC? But note that in this way you loose the names of the arguments:

/*[hpy-clinic start generated code]*/
...
/*[hpy-clinic end generated code]*/
HPyDef_METH_CLINIC(my_add, "my_add", my_add_impl, "d ll")
static double my_add_impl(HPyContext ctx, long a, long b)
{
    return (double)(a+b);
}

Proposal #2: HPyFunc_* and macros

The following integrates very well with the existing API, with all the pros&cons described in the sections above.

HPyDef_METH(my_add, "my_add", my_add_impl, HPyFunc_d_ll)
static double my_add_impl(HPyContext ctx, long a, long b)
{
    return (double)(a+b);
}

EDIT: s/4 bits/2 bits in the "How to encode C signatures" section

seberg commented 3 years ago

@antocuni for NumPy I still would like to do this based on the FASTCALL, a set of two macros (one just to create a static storage, the second one to add it to the argument list, so you could do without macros):

        NPY_PREPARE_ARGPARSER;
        if (npy_parse_arguments("reduce", args, len_args, kwnames,
                "array", NULL, &op,
                "|axis", NULL, &axes_in,
                "|dtype", NULL, &otype_obj,
                "|out", NULL, &out_obj,
                "|keepdims", NULL, &keepdims_obj,
                "|initial", NULL, &initial,
                "|where", NULL, &wheremask_obj,
                NULL, NULL, NULL) < 0) {
            goto fail;
        }

// The middle NULL above, is for converters, that was snippet just had to defer conversion to later.

Basically stripped down to not support all weird "intents", like "s" (although that could be added to some degree). I have all the code that does this, although I am not sure how much it would need changing for HPy, nor am I sure what you would think about such a static storage.

But I still like the approach and how the code ends up looking, and a static storage is necessary for speed (interning kwargs mostly), unless you want to go the Clinic path of generating header files?

Sorry if this is completely off, this is the current version (but I have a slightly updated one): https://github.com/numpy/numpy/pull/15269

EDIT: Sorry, I was too excited :(, I guess what you are doing/aiming for is probably different...

antocuni commented 3 years ago

@seberg yes, I think we are talking about different stuff: this proposal aims at bypassing argument parsing completely in certain cases. I didn't look in details at your PR but it looks to me that it's a way to speed up argument parsing, isn't it?

Anyway, if you have ideas/suggestions for how to design the HPy API in a way which helps the numpy use case, I'd be happy to hear about it. Maybe one a new issue though, to avoid cluttering this one with OT discussions

antocuni commented 3 years ago

In the meantime, I came up with a new idea for the argument-clinic option which looks better to me. Something along these lines:

#include "foo.clinic.h" // generated by hpy-clinic

HPyDef_METH(my_add, "my_add", my_add_impl, HPy_CLINIC);
HPy_CLINIC(my_add, return: double, a: long, b: long)
static double my_add_impl(HPyContext ctx, long a, long b)
{
    return (double)(a+b);
}

// we can also express more advance stuff, like CPython's argument clinic
HPyDef_METH(my_foo, "my_foo", my_foo_impl, HPy_CLINIC);
HPy_CLINIC(my_foo, return: void, obj: str(accept={str, NoneType}))
static void my_foo_impl(HPyContext ctx, const char* s)
{
    // ...
}

in this idea, HPy_CLINIC is a macro which expands to nothing, but hpy-clinic. knows how to parse its content and generates foo.clinic.h which contains all the necessary stuff, including a function declaration for my_add_impl so that the C compiler can emit a very clean message if the C signature and the Clinic signature don't match.

Moreover, it would be easy to integrate the step which generates foo.clinic.h with setup.py so it would be almost transparent to the user

antocuni commented 2 years ago

After an in-person discussion at the sprint, we came up with a more concrete proposal which tries to meet the needs of various stakeholders. Let's try to list the goals that we want to achieve.

From the point of view of extension authors:

we want to write a function with a C signature and automatically generate the argument parsing code;
ideally, we would like to handle all possible needs: e.g., default arguments, kw-only arguments, arbitrarily complex converters: this is more or less what CPython's argument clinic does if I understand correctly.

From the point of view of the final user:
we would like the signature to be inspectable, so that we can know argument names, types and a human-readable explanation of them
maybe we also want to generate typing information?

From the point of view of JIT compilers:

we want to be able to access the underlying C function pointer, inspect the signature and insert a direct C call from generated code, bypassing argument boxing/unboxing. In order to be doable in practice, the calling convention cannot be too complex: i.e., it is fine to say "this parameter is an object", but it is less fine to say "this is an objec which should be converted by calling this arbitrary code". Note that there is an intrisic tradeoff with point (2).

From the point of view of Cython (maybe):
1. Cython might want to be able to generate a fast path which skips boxing/unboxing in case it can statically determine which function we are calling and/or dynamically check whether the function supports a given C signature. In order to do that, I imagine it needs to have a fast way to check e.g. "can I get a C funtion pointer whose signature is long(*)(double, double)"? This is a complex and wide topic which probably deserves its own discussion, so I won't touch it here, but the solution below should be able to support it if needed.

The solution proposed in the comment above doesn't really work:

HPy_CLINIC(my_add, return: double, a: long, b: long)

because the mini-language what we can use into this macro is very limited and it would make it very hard to support things like default arguments and/or kw-only arguments.

Moreover, goals 2 and 5 are in contrast. In order to be usable by JITs, we need a relatively simple semantics. The proposed solution is the following:

Use an ad-hoc DSL to describe signatures inspired (or copied/adapted) by CPython's argument clinic: this lets us to have more freedom to describe the various needs. For example:

HPyDef_METH(my_func, "my_func", my_func_impl, HPyFunc_CUSTOM);
/* [HPyFunc_CUSTOM]
   my_func
       a: long
       b: long
       parent: object
       name: str = "hello"
       *
       someflag: bool = true
       return: void
*/
static void my_func_impl(HPyContext ctx, long a, long b, HPy parent, HPy name, bool someflag)
{
    ...
}

Python objects becomes parameters of type HPy. Any further parsing must be written manually by the extension author inside my_func_impl. E.g.: type checks, converters and complex default values. A good first approximation is that the only allowed default values for object are HPy_NULL, ctx->h_None, literal strings and literal numbers -- i.e. the kind of objects which can be expressed in a reasonable way using C syntax.
str is common and useful enough to warrant an exception of rule (2).
Extension authors will have to run a tool (e.g. python -m hpy.tools.clinic) which parses the [HPyFunc_CUSTOM] declarations and emits a C file to be #included -- exactly as CPython's argument clinic does. We will support two workflows:
- automatically run the the tool on build (using some build system hook)
- manually run the tool and commit the generated C file
The generated C file contains:
- a forward declaration of my_func_impl, so that the compiler can emit an error if my_func_impl does not match what is described in the clinic string;
- And HPyDef_METH(my_func, ...) which uses a standard calling convention (e.g. HPyFunc_KEYWORDS), parses the arguments and calls my_func_impl.
The resulting HPyDef will also store a pointer to my_func_impl plus some machine-readable description of the signature, so that JITs and other tools can use it.

A nice property of this solution is that Python implementations are not required to add special support for it: if they want, they can special case HPyFunc_CUSTOM, but if they want the can just ignore it and use the "standard" HPyDef_METH which is automatically generated.

hpyproject / hpy

Argument clinic-like way to generate the argument parsing logic #129

Goals & constraints

Argument Clinic vs C macros

How to encode C signatures

Return types

Runtime signature checks

Relationship with the current `HPyFunc_*`

Proposal #1: Argument clinic and C strings

Proposal #2: HPyFunc_* and macros

hpyproject / hpy

Argument clinic-like way to generate the argument parsing logic #129

Goals & constraints

Argument Clinic vs C macros

How to encode C signatures

Return types

Runtime signature checks

Relationship with the current HPyFunc_*

Proposal #1: Argument clinic and C strings

Proposal #2: HPyFunc_* and macros

Relationship with the current `HPyFunc_*`