hpyproject / hpy

HPy: a better API for Python
https://hpyproject.org
MIT License
1.1k stars 52 forks source link

Simpler & more typesafe API for type-specific calls #13

Open antocuni opened 4 years ago

antocuni commented 4 years ago

The current C-API provides lots of functions which are duplicate, the only difference being whether they do a typecheck of the argument or not. E.g., PyTuple_GET_SIZE vs PyTuple_Size, PyTuple_GET_ITEM vs PyTuple_GetItem, etc. The following idea tries to:

  1. reduce the number of API functions
  2. make it possible to call functions with or without type checking
  3. use the C typesystem to catch some errors at compile time

The idea is to have special C types of handles which represent specialized objects and/or protocolos, e.g. HPyTuple. You can cast an HPy to HPyTuple with or without typechecks. Functions like HPyTuple_Size takes and HPyTuple argument, so it is impossible to call them with an HPy. For example:

typedef struct { long _i; } HPy;
typedef struct { long _i; } HPyTuple;

#define HPyTuple_CAST(x) ((HPyTuple){x._i})

void print_tuple(HPyContext ctx, HPy obj)
{
    // no typecheck, we assume that the user knows that obj is a tuple
    //HPyTuple t = HPyTuple_CAST(ctx, obj);

    // typecheck. "obj" is automatically closed
    HPyTuple t = HPyTuple_Check(ctx, obj); // or maybe HPyTuple_TryCast?
    if (HPy_IsNull(t))
        return HPy_NULL;

    // the HPyTuple_* functions DO NOT do any form of typechecking
    long n = HPyTuple_Size(ctx, t);
    for (int i=o; i < n; i++) {
        HPy item = HPyTuple_GetItem(ctx, t, i)
        HPy_Print(ctx, item);
        HPy_Close(cts, item);
    }
    HPy_Close(ctx, t);
}
timfel commented 4 years ago

I'm personally not a fan of casting in this manner. The reason is at least in part related to how we execute this on Graal - via the LLVM bitcode. The problem is that the bitcode doesn't keep any information about the cast around. So we just get an "access this handle at offset such and such" and then we have to figure out what they casted that to based on the actual type that we know is behind the handle and the offset. I would definitely be in favour of the HPyTuple_Check function for casting, since that communicates to us what type of handle is expected. It also gives us an opportunity to potentially copy things to native at that point when it makes sense, rather than having to copy more eagerly, because someone might do a C level cast.

timfel commented 4 years ago

That being said, I guess we could allow to implement HPyTuple_CAST via a function call rather than a macro in our API headers, and then we'd get the call. It's still much better for us than a straight C-level typecast. The disadvantage (for us) would be that we'd have to have our own headers for this.

arigo commented 4 years ago

@antocuni Maybe. For now I'll just point out that "obj" is automatically closed both makes the API more irregular, and is wrong in your example, because a function should not close its arguments.

arigo commented 4 years ago

Also, I think it's similar to what we discussed with custom object protocols. Here is a version that looks more like https://github.com/pyhandle/hpy/issues/9. Note that the advantage is that it doesn't need custom functions or macros to access the size and the elements of the tuple.


    void print_tuple(HPyContext ctx, HPy obj)
    {
        // returns a small struct.
        // ``t.items`` is a C array of HPyFields, valid as long as the "obj" handle is
        HPy_Tuple t = HPyTuple_AsTuple(ctx, obj);

        for (int i=o; i < t.size; i++) {
            HPy item = HPy_Load(t.items[i]);
            HPy_Print(ctx, item);
            HPy_Close(item);
        }
    }
antocuni commented 4 years ago

@timfel

The problem is that the bitcode doesn't keep any information about the cast around. So we just get an "access this handle at offset such and such"

Note that in the current situation, what you get are calls like HPyTuple_GET_SIZE(HPyContext ctx, HPy o): so, you don't have any type information anyway. Basically, with this proposal you would get the same bitcode as now, the only difference is that the C compiler does some more typechecking.

The disadvantage (for us) would be that we'd have to have our own headers for this.

A note on the direction where HPy is going right now: at the moment we have two sets of headers: one which is specific to CPython and one which is "universal": the idea is that a module compiled with the universal headers can be imported by any python implementation (including CPython -- but this will be slower: the implementation if cpython-universal). Nothing prevents python implementation to provide their own specific headers, of course and/or to decide that it will NOT support universal modules.

arigo commented 4 years ago

Nothing prevents python implementation to provide their own specific headers, of course and/or to decide that it will NOT support universal modules.

I think you mean "decide that it will ONLY support universal modules".

antocuni commented 4 years ago

@arigo

For now I'll just point out that "obj" is automatically closed both makes the API more irregular, and is wrong in your example, because a function should not close its arguments.

True :(. My intent was to make more similar the two cases in which you use HPyTuple_CAST or HPyTuple_Check. I suppose we could could return a new handle even from HPyTuple_CAST, albeit this will cause a small performance hit on CPython compared to what you have now.

Also, I think it's similar to what we discussed with custom object protocols.

yes, it's also similar to what is written here: https://github.com/pyhandle/hpy/blob/master/docs/api.md#specialized-protocols

Here is a version that looks more like #9. Note that the advantage is that it doesn't need custom functions or macros to access the size and the elements of the tuple.

    void print_tuple(HPyContext ctx, HPy obj)
    {
        // returns a small struct.
        // ``t.items`` is a C array of HPyFields, valid as long as the "obj" handle is
        HPy_Tuple t = HPyTuple_AsTuple(ctx, obj);

        for (int i=o; i < t.size; i++) {
            HPy item = HPy_Load(t.items[i]);
            HPy_Print(ctx, item);
            HPy_Close(item);
        }
    }

maybe, but this is hard/slowish to implement on PyPy: you would need to allocate a non-movable array and fill it with handles. Or maybe pinning the exising W_TupleObject.wrappeditems, but then how would you implement it for things like W_SpecialisedTupleObject?

antocuni commented 4 years ago

Nothing prevents python implementation to provide their own specific headers, of course and/or to decide that it will NOT support universal modules.

I think you mean "decide that it will ONLY support universal modules".

No, I meant that if for some reason graalpython can't load universal modules (e.g. because it can't follow the cast, as @timfel said), it could conceivably decide to drop support for universal and implement only graalpython-specific headers.

Although it is much better to try to design the API so that universal modules are truly universal, of course :)

timfel commented 4 years ago

@antocuni

Note that in the current situation, what you get are calls like HPyTuple_GET_SIZE(HPyContext ctx, HPy o): so, you don't have any type information anyway. Basically, with this proposal you would get the same bitcode as now, the only difference is that the C compiler does some more typechecking.

My point was more that with HPyTuple_Check, we have a place (a proper API call) where the implementation can attach that type information to the object behind the returned handle. With a C-level cast, we do not. So such a call is just better for us.

timfel commented 4 years ago

I am definitely for having the API enforce type safety via a cast, I would just like that cast to be inside an API function call (so an extern function in the "universal" header), and only maybe for the CPython headers use a #define that uses a C-level cast

antocuni commented 4 years ago

My point was more that with HPyTuple_Check, we have a place (a proper API call) where the implementation can attach that type information to the object behind the returned handle. With a C-level cast, we do not. So such a call is just better for us.

I think I am missing a point here. The tuple object will be constructed by either calling HPyTuple_New or by creating the tuple inside python code: so these are the obvious places where to attach information to the object. Unless you are talking about attaching information to the handle itself: I can't see a reason to do that, but maybe you have a good use case for it that I can't think of.

I am definitely for having the API enforce type safety via a cast, I would just like that cast to be inside an API function call (so an extern function in the "universal" header), and only maybe for the CPython headers use a #define that uses a C-level cast

I see your point. I think this is a design goal that we should try to clarify soon: do we want the universal API to be completely opaque, or are we fine doing some work directly in the header with macros or inline functions? I see pros&cons for both. The biggest con is efficiency of course: e.g. currently HPy_IsNull() is implemented as a macro, and it's the preferred way to check whether an API call has returned an error; having to call a function for it sounds excessively expensive (although I am ready to be proved wrong by benchmarks :))

timfel commented 4 years ago

The tuple object will be constructed by either calling HPyTuple_New or by creating the tuple inside python code: so these are the obvious places where to attach information to the object.

What if someone creates a native subclass of tuple? These will have to be allocated using malloc, so that's just some native allocation. Will we force them to do an upcall where we can attach the type information?

antocuni commented 4 years ago

What if someone creates a native subclass of tuple? These will have to be allocated using malloc, so that's just some native allocation. Will we force them to do an upcall where we can attach the type information?

To instantiate an object given a type, you'll have to call an HPy_New or something like that: of course you will be able to use a custom allocator if you want, but the allocator will be called by the HPy runtime. You can't just malloc a piece of memory and call it an HPy object, because the underlying memory layout if completely opaque so you will have no idea where to store e.g. the equivalent of ob_type

timfel commented 4 years ago

@antocuni alright, you convinced me :-) I am mostly concerned with macros that access struct members, which I would like to avoid, because in our case that often means a wrapper or exposing that member on the Python object.

antocuni commented 4 years ago

I am mostly concerned with macros that access struct members, which I would like to avoid, because in our case that often means a wrapper or exposing that member on the Python object.

Yes sure, I completely agree. The idea of hpy is that all the runtime data structures will be opaque, so no macros and no direct access to fields. Even the HPy type is opaque: it is big enough to container either a pointer or an index into an array, but it's up to the implementation to decide what to use.