Open antocuni opened 3 years ago
@antocuni for NumPy I still would like to do this based on the FASTCALL, a set of two macros (one just to create a static storage, the second one to add it to the argument list, so you could do without macros):
NPY_PREPARE_ARGPARSER;
if (npy_parse_arguments("reduce", args, len_args, kwnames,
"array", NULL, &op,
"|axis", NULL, &axes_in,
"|dtype", NULL, &otype_obj,
"|out", NULL, &out_obj,
"|keepdims", NULL, &keepdims_obj,
"|initial", NULL, &initial,
"|where", NULL, &wheremask_obj,
NULL, NULL, NULL) < 0) {
goto fail;
}
// The middle NULL above, is for converters, that was snippet just had to defer conversion to later.
Basically stripped down to not support all weird "intents", like "s"
(although that could be added to some degree). I have all the code that does this, although I am not sure how much it would need changing for HPy, nor am I sure what you would think about such a static storage.
But I still like the approach and how the code ends up looking, and a static storage is necessary for speed (interning kwargs mostly), unless you want to go the Clinic path of generating header files?
Sorry if this is completely off, this is the current version (but I have a slightly updated one): https://github.com/numpy/numpy/pull/15269
EDIT: Sorry, I was too excited :(, I guess what you are doing/aiming for is probably different...
@seberg yes, I think we are talking about different stuff: this proposal aims at bypassing argument parsing completely in certain cases. I didn't look in details at your PR but it looks to me that it's a way to speed up argument parsing, isn't it?
Anyway, if you have ideas/suggestions for how to design the HPy API in a way which helps the numpy use case, I'd be happy to hear about it. Maybe one a new issue though, to avoid cluttering this one with OT discussions
In the meantime, I came up with a new idea for the argument-clinic option which looks better to me. Something along these lines:
#include "foo.clinic.h" // generated by hpy-clinic
HPyDef_METH(my_add, "my_add", my_add_impl, HPy_CLINIC);
HPy_CLINIC(my_add, return: double, a: long, b: long)
static double my_add_impl(HPyContext ctx, long a, long b)
{
return (double)(a+b);
}
// we can also express more advance stuff, like CPython's argument clinic
HPyDef_METH(my_foo, "my_foo", my_foo_impl, HPy_CLINIC);
HPy_CLINIC(my_foo, return: void, obj: str(accept={str, NoneType}))
static void my_foo_impl(HPyContext ctx, const char* s)
{
// ...
}
in this idea, HPy_CLINIC
is a macro which expands to nothing, but hpy-clinic.
knows how to parse its content and generates foo.clinic.h
which contains all the necessary stuff, including a function declaration for my_add_impl
so that the C compiler can emit a very clean message if the C signature and the Clinic signature don't match.
Moreover, it would be easy to integrate the step which generates foo.clinic.h
with setup.py so it would be almost transparent to the user
After an in-person discussion at the sprint, we came up with a more concrete proposal which tries to meet the needs of various stakeholders. Let's try to list the goals that we want to achieve.
From the point of view of extension authors:
we want to write a function with a C signature and automatically generate the argument parsing code;
ideally, we would like to handle all possible needs: e.g., default arguments, kw-only arguments, arbitrarily complex converters: this is more or less what CPython's argument clinic does if I understand correctly.
From the point of view of the final user:
we would like the signature to be inspectable, so that we can know argument names, types and a human-readable explanation of them
maybe we also want to generate typing information?
From the point of view of JIT compilers:
we want to be able to access the underlying C function pointer, inspect the signature and insert a direct C call from generated code, bypassing argument boxing/unboxing. In order to be doable in practice, the calling convention cannot be too complex: i.e., it is fine to say "this parameter is an object", but it is less fine to say "this is an objec which should be converted by calling this arbitrary code". Note that there is an intrisic tradeoff with point (2).
From the point of view of Cython (maybe):
The solution proposed in the comment above doesn't really work:
HPy_CLINIC(my_add, return: double, a: long, b: long)
because the mini-language what we can use into this macro is very limited and it would make it very hard to support things like default arguments and/or kw-only arguments.
Moreover, goals 2 and 5 are in contrast. In order to be usable by JITs, we need a relatively simple semantics. The proposed solution is the following:
HPyDef_METH(my_func, "my_func", my_func_impl, HPyFunc_CUSTOM);
/* [HPyFunc_CUSTOM]
my_func
a: long
b: long
parent: object
name: str = "hello"
*
someflag: bool = true
return: void
*/
static void my_func_impl(HPyContext ctx, long a, long b, HPy parent, HPy name, bool someflag)
{
...
}
Python objects becomes parameters of type HPy
. Any further parsing must
be written manually by the extension author inside my_func_impl
. E.g.:
type checks, converters and complex default values. A good first
approximation is that the only allowed default values for object
are
HPy_NULL
, ctx->h_None
, literal strings and literal numbers -- i.e. the
kind of objects which can be expressed in a reasonable way using C syntax.
str
is common and useful enough to warrant an exception of rule (2).
Extension authors will have to run a tool (e.g. python -m hpy.tools.clinic
) which parses the [HPyFunc_CUSTOM]
declarations and
emits a C file to be #include
d -- exactly as CPython's argument clinic
does. We will support two workflows:
automatically run the the tool on build (using some build system hook)
manually run the tool and commit the generated C file
The generated C file contains:
a forward declaration of my_func_impl
, so that the compiler can emit
an error if my_func_impl
does not match what is described in the
clinic string;
And HPyDef_METH(my_func, ...)
which uses a standard calling convention
(e.g. HPyFunc_KEYWORDS
), parses the arguments and calls my_func_impl
.
The resulting HPyDef
will also store a pointer to my_func_impl
plus
some machine-readable description of the signature, so that JITs and other
tools can use it.
A nice property of this solution is that Python implementations are not
required to add special support for it: if they want, they can special case
HPyFunc_CUSTOM
, but if they want the can just ignore it and use the
"standard" HPyDef_METH
which is automatically generated.
This is an attempt to start a discussion on how to implement function calls and function definitions. In the following, "functions" and "methods" are used interchangeably.
First, we need to identify who are the interested players and use cases:
authors of C extensions who write C functions. Let's call them C-writers.
Authors of C extensions who call Python functions. Let's call them C-callers.
Generators of C extensions. Let's call them "cython" but it applies also to others.
Python implementations with a JIT. Let's call them "pypy", but it applies also to others.
The basic idea is that we want to allow C-writers to write functions with a C-level signature, and generate the logic to do argument parsing automatically. E.g.:
The precise details of how to implement
AUTOMAGICALLY_GENERATE_PYTHON_FUNCTION
are the scope of this discussion.Goals & constraints
Writing functions in this way is optional. It will always be possible to write functions with the usual calling conventions such as
HPyFunc_VARARGS
and to the argument parsing manually.There must be a way to quickly check whether a Python callable supports a given C-level signature and to get the underlying function pointer. "pypy" can use this to generate code which completely bypasses the python argument parsing logic, and cython" could use it to emit a fast-path in case it statically know the C types of the arguments.
Ideally, this should be integrated with the C API to call functions. E.g., if a C-caller calls
HPy_Call(my_add, "ll", 4, 5)
, an implementation should be able to bypass argument parsing and callmy_add_impl
directly.Bonus point if we find a way to implement goal 3 also on CPython.
It should be possible to do things manually: i.e., a C-writer could write its own argument parsing code and be able to declare a C-level signature. In that case, he needs to ensure that its own argument parsing code does the "correct things". In particular, when you do call from Python like
my_add(4, 5)
, the implementation should be free to decide whether to call the generic version or the C-specialized overload.To be discussed: we need to decide whether we want to support "overloads" or not. I.e., a given Python functions could in principle support many different C-level signatures. I think that all the following options are reasonable:
support at most one C-level signature. Functions which can't be encoded this way needs to be written in the "old style" and parse arguments manually.
Support at most N C-level signatures, where N is a small number like 4. If you require more than N, you have to write parsing manually.
Support a potentially unlimited number of C-level signatures.
Argument Clinic vs C macros
CPython has already something similar: it is called Argument Clinic and AFAIK it's used only internally. You have to write special comments in the C code to specify the signature of your function, then you run a python script which edits the C files and adds the relevant autogenerated code. In the following, when I talk about "Argument Clinic" I don't necessarily mean the very same
clinic.py
as CPython. We will probably have to write our own version, but the concept is the same.On the other hand, HPy so far has relied on macros to generate pieces of code. Consider the following example:
Here,
HPyDef_METH
is a macro which among the other things generates a small trampoline to convert from the CPython calling convention to the HPy calling convention. It generates something similar to this (in the CPython-ABI case):So one option is to extend this functionality to generate also the argument parsing logic (more on that later).
Both options have pros and cons, IMHO:
Clinic PRO: is already known by CPython devs and it seems to work well.
Clinic PRO (very futuristic): it will be easier for CPython itself to migrate to HPy.
Clinic CON: C-writers might dislike the fact that an external script modifies their C source, potentially cluttering them with lines and lines of obscure code. It might be possible to put all the generated code into a separate file though.
Macros PRO: they just work with any C compiler. The C code which you have to write is probably more compact and/or nicer to read.
Macros CON: probably we will be more limited in the complexity of logic we can generate (maybe it's a PRO :)).
Macros CON: compiler erros are potentially more obscure, although so far in HPy we managed to get very good compiler errors in response to common mistakes, at least with gcc.
Macros CON: we need to put an upper bound to the number of arguments, because we need to autogen a file which contains the macros for all possible C signatures. We also need to check whether this impacts compilation time negatively.
How to encode C signatures
There are at least two ways to encode/specify C signatures:
use a C string: this is more or less equivalent to what you can pass to
HPyArg_Parse
, although we need to extend the notation to specify the return type. E.g.,"d ll"
could meandouble _(long, long)
.extend the enum
HPyFunc_Signature
to support many more signatures. So, in addition toHPyFunc_VARARGS
,HPyFunc_NOARGS
etc., you have e.g.HPyFunc_d_ll
which corresponds todouble _(long, long)
. In this scenario each signature is represented by a singleint64_t
value, and there are at least a couple of variations for how to encode it:if we decide that we are happy to support only
long
,double
,HPy
andvoid
, we can encode a single type in 2 bits. So, we can specify signatures up to 31 arguments (2 bits are reserved for the return type), maybe a bit less if we want to save some bits to encode other interesting features (e.g. if the function supports varargs or keywords).Use 8 bits for each param: with this we can support many more types, but we are limited to signatures up to ~7 arguments, maybe 6 if we want to reserve some bits for other features. 6-7 arguments are enough to cover the vast majority of functions though: if a function wants to use more args, it has to do argument parsing by itself.
Pros/cons of each approach:
C string PRO: easy to understand, very flexible.
C string CON: it's impossible to do any compile-time type check.
C string CON: I think it's impossible to implement it with macros. The current approach of using
HPyFunc_*
works because we can write "specialized" versions ofHPyFunc_TRAMPOLINE
for each possible value ofHPyFunc_Signature
, but I have no clue how to do it with a generic C string using macros. So, if we choose this, we are automatically choosing argument clinic.HPyFunc PRO: checking whether a callable supports a given signature is very quick, since you just compare two ints. Do the same check with strings is probably slower because you need a
strcmp
.HPyFunc PRO: works with the macros approach.
HPyFunc PRO: we can probably write something which does compile-time checks of the argument types.
HPyFunc CON: much less flexible that C strings. The C syntax for calling is probably also less nice, e.g.
HPy_Call(HPyFunc_d_ll, 4, 5)
vsHPy_Call("ll", 4, 5)
.HPyFunc CON: if we use the macros approach, we probably need to generate a huge header with all the macros definitions, which might impact the build time.
We can also adopt a hybrid approach: the user-facing API takes and receives C strings for signatures, but internally we represent them inside an encoded
int64_t
. This should make runtime signature checks faster (but it's probably a good idea to do some benchmarks).Return types
Another open question is what to do with return types. Consider the example above in which I have the function
my_add
whose signature is"d ll"
(i.e.,double _(long, long)
):in this case, the return type is
HPy
. But what if I want to call the C function directly and the adouble
back, without having to box it? Cython surely needs an API to do that. So maybe something like this:Suggestions for a better name instead of
HPyFunc_Try
are welcome.Runtime signature checks
Why do we need fast runtime signature checks? I can think of at least two use cases:
HPy_Call
: you can add a fast-path: if the callable supports the given signature, you call it directly, bypassing the boxing/unboxing. But it is unclear whether it is doable sinceHPy_Call
knows only the types of the arguments, not the type of the result.HPyFunc_Try
: see above, this is needed by Cython.Note that this doesn't apply to "pypy": assuming that the callee is known, the JIT can do the signature check at compile time, so it doesn't have to be particularly efficient.
Relationship with the current
HPyFunc_*
Currently the enum
HPyFunc_Signature
defines ~30 signatures which are used by methods and slots. We need to understand whether the represent the same thing as the C-level function signatures or whether they are completely different beasts. E.g.,HPyFunc_O
is basically equivalent to"O O"
,HPyFunc_BINARYFUNC
to"O OO"
,HPyFunc_INQUIRY
to"i O"
, etc.Proposal #1: Argument clinic and C strings
Let's try to turn this into something more concrete. At the moment, I am not happy with eiher of those though. The following is a sketch proposal which uses the "Argument clinic" and "C string" approaches described above:
What I don't like too much of this approach is that it's completely different that the
HPyDef_METH
that you use for non-argument clinic methods. Maybe something like this, in which we put the generated code BEFORE the call toHPyDef_METH_CLINIC
? But note that in this way you loose the names of the arguments:Proposal #2: HPyFunc_* and macros
The following integrates very well with the existing API, with all the pros&cons described in the sections above.
EDIT: s/4 bits/2 bits in the "How to encode C signatures" section