Closed DuToitSpies closed 7 months ago
The error in tests / Check autogen
does not seem to be related to the changes in that PR.
I'll take a look ...
I am a bit troubled by the many shape objects. They are used to access low-level, implementation specific APIs for the different objects, right? Isn't there a way to tell cython "hey, don't do that"? The type objects are opaque on PyPy, so from my very small world this is unneeded complication.
I am a bit troubled by the many shape objects. They are used to access low-level, implementation specific APIs for the different objects, right?
The shapes provide a means to access the internal struct of built-in objects after the PyObject header part. It mimics what happens for user defined subclasses.
It's needed for cases where an extension subclasses built-in types, I think. Someone else please correct me if needed.
@mattip has a point here though -- to what extent are these additional pieces of data really a CPython implementation detail? One could make a good argument that they are entirely an implementation detail.
The question then is whether we can provide a better porting path for existing libraries, or whether implementations just have to mimic this and possibly be slow?
Isn't there a way to tell cython "hey, don't do that"?
Yes there is but that is not unrelated to the shapes. What Du Toit needs for his Cython/HPy work is essentially HPyContext.h_DictType
. Further, he just added the additional built-in shape because dict
is a built-in type.
The type objects are opaque on PyPy, so from my very small world this is unneeded complication.
I can understand this. The built-in shapes are not intuitive. I would like to raise your attention to #169 (and the implementing PR #335). Since the issue and PR are a lot of reading, I try to summarize here:
PDict
which is equivalent to PyDictObject
).dict
and have the HPyTypeSpec's basicsize
set. In CPython, this would allocate an object of size sizeof(PyDictObject) + basicsize
. In order to return the right pointer for CustomType_AsStruct
, we need to know the base type. We could store this information at the type but this means, each time you call CustomType_AsStruct
, we would have two additional indirections (read the type + read HPyType_Extra_t
). So, one of our design decisions was, this should be constant. And to do so, we came up with the built-in shapes because they are, IMO (and Antonio agreed back then), the generalization of that. Every runtime is free to ignore the built-in shapes and store the additional info at the type.One could make a good argument that they are entirely an implementation detail.
Yes, you can. It could be considered as an implementation detail. But this is similar to, e.g., HPy_GetItem_s
. Why do we have it. Wouldn't it be better to just have HPy_GetItem
and if the key happens to be created from a C string, then this is an impl detail, right?
I see the built-in shapes as a tradeoff between performance and usability.
Also, I think the built-in shape didn't make things worse. Please remind that we previously had the legacy flags and the shapes are just a generalization of that.
@fangerer Maybe I misunderstood how the shapes are used then. Do we just need their size? And do we only really need them on CPython?
And do we only really need them on CPython?
We don't strictly need them at all. As I wrote, we could store the information in HPyType_Extra_t
at the type. I think that can best be understood by looking at the debug mode version of ctx_AsStruct_*
: https://github.com/hpyproject/hpy/blob/bdf7e7ff99194b9ec5d15334303420c0b496f143/hpy/debug/src/debug_ctx.c#L345
The built-in types are just an additional information that can help interpreters.
Do we just need their size?
No, the size would not help alternative implementations. E.g. in GraalPy, we are not allocating objects by size (as CPython does), we are allocating them using a Java class. But it helps GraalPy because we could store the user data pointer in a well-known location of the appropriate Java class.
Even if this PR raised some discussions about the built-in shapes, I will merge it because this PR is just adding one (missing) shape. So, if built-in shapes are the problem, not merging this PR would solve it. @hodgestar @mattip Any concerns merging this PR?
@fangerer Very happy for this to be merged. Let's add the shape discussion for the agenda of the next dev call.
Adds h_DictType to HPy, as this is needed for the Cython port.