hpyproject / hpy

HPy: a better API for Python
https://hpyproject.org
MIT License
1.1k stars 52 forks source link

Kill ctx->h_None & co. #226

Open antocuni opened 3 years ago

antocuni commented 3 years ago

Recently we discussed a potential refactoring of HPyContext, and one of the problems was what to do with the h_* fields.

I think we can/should just kill them and have normal functions instead. Some points:

  1. The only reasonable operation to do with them is calling HPy_Dup.
  2. It's an additional thing to learn, and people need to remember whether they need to call dup or not (they do!)
  3. It's harder to implement because we need logic to initialize the ctx upon the first access, to check whether it's already initialized, etc.

Let's consider some alternatives:

HPy h = HPy_Dup(ctx, ctx->h_None); // current situation
HPy h = HPy_None(ctx);
HPy h = HPy_GetNone(ctx);

Personally, I prefer HPy_None() than HPy_GetNone(). The only drawback I can think it's that it might be a bit confusing for people used to the old API, where Py_None is a variable and not a function. However, once you are in the righr "HPy mindset" you know that you have to pass the ctx everywhere, so I think it's pretty natural to turn it into a function.

hodgestar commented 3 years ago

Some thoughts / questions:

antocuni commented 3 years ago
  • What about HPyDup_None(ctx)?

I thought of it as well, but I don't like it too much. From a high-level point of view, what it returns is a handle to None, to a duplicate of another handle (although the distinction is very blur, I admit). So, following this reasoning HPy_GetNone() looks more direct to me.

But I still prefer HPy_None(), even though I can't explain exactly why :)

Another possibility is to introduce another namespace instead of just using HPy_. E.g. HPySingletons_None(), HPySinglestons_True() etc., apart the fact that HPySingletons is way too long to type. Any idea for a shorter name?

  • Does the ctx get a new function pointer to replace each h_ handle pointer? If not, what happens instead?

yes. Basically, we would autogenerate functions like this, and they would be treated like all the other functions:

HPy ctx_None(HPyContext *ctx) {
    Py_INCREF(Py_None);
    return _py2h(Py_None);
}
antocuni commented 3 years ago
  • The only reasonable operation to do with them is calling HPy_Dup.

I correct myself here. There is at least another reasonable thing to do with h_None & co.:

if (HPy_Is(ctx, h, ctx->h_None))
    ...

This idiom becomes unnecessarily complicated with the HPy_None() approach, because you need to close the returned handle:

HPy h_None = HPy_None(ctx);
if (HPy_Is(ctx, h, h_None))
    ...
HPy_Close(h_None);

Possible solutions:

  1. introduce HPy_IsNone(), HPy_IsTrue() and HPy_IsFalse(), as shortcuts for very common operations
  2. design a more general solution like HPyNone_Get()/HPyNone_Is(), HPyTrue_Get()/HPyTrue_Is(), etc.

Can we think of other reasonably common operations which are easy to do with the current ctx->h_None approach but would be harder with the HPy_None(ctx) approach?

mwalkiewicz commented 3 years ago

Won't OPs point no. 2 apply to HPy_(Get)None as well? Name of the function does not clearly state if the handle is some kind of a singleton None object (that should or should not be HPy_Duped separately) or another handle. (It was not obvious to me that acquired handle should be closed, but I'm just looking at this project since a few days ago and maybe it's just my lack of HPy intuition). On the other hand HPyDup_None explicitly announces that callers are acquiring a duplicated handle and are responsible for closing it.

antocuni commented 3 years ago

Won't OPs point no. 2 apply to HPy_(Get)None as well? Name of the function does not clearly state if the handle is some kind of a singleton None object (that should or should not be HPy_Duped separately) or another handle. (It was not obvious to me that acquired handle should be closed,

Returning a handle which doesn't need to be closed is equivalent to returning a borrowed reference in CPython, and we are explicitly trying to avoid that. As a general rule, all the handles which are returned by a function need to be explicitly closed.

The fact that you didn't find this rule anywhere it's our fault: we do have some API design principle and guidelines, but we never wrote them nicely in the docs. But e.g., you can find this precise statement in this note: https://github.com/hpyproject/hpy/blob/7d457b4d6fa21a33c67b0aaa3bfc5582784fb051/docs/leysin-2020-design-decisions.md#attribute-and-item-access

hodgestar commented 3 years ago

Additional idea from #250:

We could define a list of constants via an enum or defines and then have one function HPy_GetType(ctx, HPY_BOOL_TYPE) that returns a new handle to the specified type.

And perhaps a separate enum for constants.

I'm not sure I particularly like these solutions, but they are ideas that might eventually mutate into something useful.

steve-s commented 3 years ago

Isn't enums vs lots of generated methods orthogonal to the usability issue with HPy_Is(ctx, h, ctx->h_None) vs. return HPy_Dup(ctx, ctx->h_None)?

My 2c is that ctx->h_None looks really error prone. I suspect that return ctx->h_None would be one of the most common errors if we keep it like that. On the other hand:

Can we think of other reasonably common operations which are easy to do with the current ctx->h_None approach but would be harder with the HPy_None(ctx) approach?

Anything where you'd just immediately forward the handle to some other API function, like

HPyList_Append(ctx, mylist, ctx->h_True)

making this into

HPy py_true = HPy_DupTrue(ctx);
HPyList_Append(ctx, mylist, py_true);
HPy_Close(ctx, true);

This can be annoying, but it kind of feels like the proper way. ctx->h_True feels bit weird, unlike everything else it's not a function, so it doesn't give you a fresh handle. This all looks like difficult decision between more error prone, less consistent, but easier to use API, or consistent, but sometimes cumbersome to use API.

One more example re consistency:

HPy get_me_value_helper(HPyContext* ctx) {
   return ctx->h_True;
}

// ...
HPyList_Append(ctx, mylist, get_me_value_helper());

after some refactoring, someone may want to put 42 into the list, so they change the get_me_value_helper...

HPy get_me_value_helper(HPyContext* ctx) {
   return HPyLong_FromLong(42);
}

// ...
HPyList_Append(ctx, mylist, get_me_value_helper());
// boom leaked handle!
antocuni commented 2 years ago

We could define a list of constants via an enum or defines and then have one function HPy_GetType(ctx, HPY_BOOL_TYPE) that returns a new handle to the specified type.

And perhaps a separate enum for constants.

I'm not sure I particularly like these solutions, but they are ideas that might eventually mutate into something useful.

I don't like this idea at all. It's much easier and more consistent to use e.g. HPy_GetBoolType(), HPy_GetNone(), etc. From the user point of view, both techniques are equally inconvenient compared to ctx->h_None, but something like HPy_GetType() adds a level of indirection which feels unnecessary. E.g., you would need to check for errors in case you pass a wrong and/or non-existent enum value.

The only "advantage" of HPy_GetType is to keep the total number of API functions smaller, but I don't think it's enough to counterweight the cognitive burden of introducing yet another concept.

antocuni commented 2 years ago

Isn't enums vs lots of generated methods orthogonal to the usability issue with HPy_Is(ctx, h, ctx->h_None) vs. return HPy_Dup(ctx, ctx->h_None)?

yes, I agree.

My 2c is that ctx->h_None looks really error prone. I suspect that return ctx->h_None would be one of the most common errors if we keep it like that. On the other hand:

that's a very good point, but note that CPython has the same problem: you cannot return Py_None, you need to incref it. To mitigate the problem we could introduce HPy_RETURN_NONE, which would be similar to Py_RETURN_NONE of course.

Can we think of other reasonably common operations which are easy to do with the current ctx->h_None approach but would be harder with the HPy_None(ctx) approach?

Anything where you'd just immediately forward the handle to some other API function, like

HPyList_Append(ctx, mylist, ctx->h_True)

[cut] This can be annoying, but it kind of feels like the proper way. ctx->h_True feels bit weird, unlike everything else it's not a function, so it doesn't give you a fresh handle. This all looks like difficult decision between more error prone, less consistent, but easier to use API, or consistent, but sometimes cumbersome to use API.

Good point. I think it will be interesting to look at top4000 and see how often this pattern is used in practice. I also agree that the "proper way" looks harder to use, so it's a difficult design choice. Also, we must keep in mind that one of the HPy goals is to make it easy to port existing extensions, so from this POV, the ctx->h_None approach wins.

One more example re consistency:

HPy get_me_value_helper(HPyContext* ctx) {
   return ctx->h_True;
}

// ...
HPyList_Append(ctx, mylist, get_me_value_helper());

after some refactoring, someone may want to put 42 into the list, so they change the get_me_value_helper...

HPy get_me_value_helper(HPyContext* ctx) {
   return HPyLong_FromLong(42);
}

// ...
HPyList_Append(ctx, mylist, get_me_value_helper());
// boom leaked handle!

also a good point. Thankfully, the debug mode will catch this kind of errors easily, at least.