python / cpython

The Python programming language
https://www.python.org
Other
62.16k stars 29.88k forks source link

Annotation-based syntax for ctypes structs #104533

Open orent opened 1 year ago

orent commented 1 year ago

Turn this:

class S(ctypes.Structure):
   _fields_ = [ ('a', ctypes.c_int), ('b', ctypes.c_char_p) ]

Into this:

class S(ctypes.Structure):
    a : ctypes.c_int
    b : ctypes.c_char_p

See discussion on https://discuss.python.org/t/annotation-based-sugar-for-ctypes/26579

Working on implementation

zwergziege commented 1 year ago

I would welcome this very much. It would also be very nice if we had annotations for incomplete types if possible. A natural next step would be using generics for pointers. I don't know though whether generics have to stabilize regarding how generic parameters are accessed during runtime (see https://github.com/python/typing/issues/629). If ctypes wouldn't insist on capitalized types, we would get code like this

class Incomplete(Struct):
  some_data : c_int
  children : Pointer['Incomplete']

which looks very clean and natural imho.

orent commented 1 year ago

You are welcome to open a separate issue. Your proposal is orthogonal to this one.

junkmd commented 1 year ago

I would like to point out the problems with this approach.

The problem is that type checkers interpret the return type of fields solely as c_foo.

As shown below, even if c_int is specified for a field, the type returned at runtime is int.

>>> import ctypes
>>>      
>>> class Foo(ctypes.Structure):
...     pass
... 
>>> Foo._fields_ = [('x', ctypes.c_int)]
>>> Foo.x
<Field type=c_long, ofs=0, size=4>
>>> type(Foo.x) 
<class '_ctypes.CField'>
>>> foo = Foo()
>>> foo
<__main__.Foo object at 0x0000023CDB80B8C0>
>>> foo.x
0
>>> foo.x = 3
>>> foo.x
3
>>> foo.x = ctypes.c_int(2)
>>> foo.x
2

In order to convey appropriate type information by annotating for fields like below, a special treatment needs to be introduced to type checkers.

class Foo(Structure):
    x: c_int

This effort requires reaching out not only to the cpython community but also to the broader community of type checker developers.

DrInfiniteExplorer commented 7 months ago

I made a dumb wrapper for ctypes to support this sort of thing at DrInfiniteExplorer/dtypes a few years back after I got tired of writing hundreds of definitions with _fields_ for a few days straight. I didn't work extensively with proper typecheckers at the time as my main project was kind of rushed, but I did hack in a simple way to forward-declare structs to make pointers, as well as simple this-type-pointers. I've recently gotten a bit more active with my projects and might spend more time on this.One thing that has been nagging me is that (as I've learned) tuples aren't valid types for annotations, so I'm thinking that bitfields could be declared with something like bitfield : Annotated[ctypes.c_uint8, Bitfield(2)] and have structify turn that into a proper _fields_ tuple. What are your thoughts and plans for this kind of thing? I think I'll continue making small improvements to dtypes, but if the official ctypes gets cooler then I'm down with that, and either way we might all benefit from discussing and sharing ideas.

picnixz commented 2 months ago

I had something similar because I'm too lazy to declare structs and unions using the regular syntax. Instead, I have something like that:

@cschema
class MyStruct(cStruct):
    i: ctypes.c_int

    class a(cStruct):
        x: ctypes.c_longlong
        y: ctypes.c_longlong

which makes it equivalent to

class MyInnerStruct(ctypes.Structure):
    x: ctypes.c_longlong
    y: ctypes.c_longlong

class MyStruct(ctypes.Structure):
    _fields_ = [('i', ctypes.c_int), ('a', MyInnerStruct)]

With this approach, I can have nested classes and unions. also, I used cStruct and cUnion as new classes in order to add metaclass keyword arguments support (which I cannot do on th native ctypes.Structure and ctypes.Union). For instance:

@cschema
class MyStruct(cStruct):
    class _(cStruct, anon=True):
        x: ctypes.c_longlong
        y: ctypes.c_longlong

becomes equivalent to

class MyInnerStruct(ctypes.Structure):
    x: ctypes.c_longlong
    y: ctypes.c_longlong

class MyStruct(ctypes.Structure):
    _anonymous_ = ['_']
    _fields_ = [('_', MyInnerStruct)]

Similarly, I can have something like:

class MyStruct(cStruct, pack=32):
    pass

class MyUnion(cUnion, pack=32):
    pass

instead of

class MyStruct(ctypes.Structure):
    _pack_ = 32

class MyUnion(ctypes.Union):
    _pack_ = 32

Note that this last construction is only useful when using nested unions. I've added other features such as:

@cschema
class MyClass(cStruct):
    field: cArray[ctypes.c_int, 32]

to be equivalent to

class MyClass(ctypes.Structure):
    _fields_ = [('field', ctypes.c_int * 32)]

Again, cArray is a custom class with a special __class_getitem__ implementation. Similary, I have something like:

@cschema
class MyClass(cStruct):
    field: cPointer[ctypes.c_int]

instead of

class MyClass(ctypes.Structure):
    _fields_ = [('field', ctypes.POINTER(ctypes.c_int))]

For type-checkers, I implemented my own mypy plugin since this construction is quite hacky. Note that I needed this @cschema decorator to be able to process the class body like for dataclasses and the special cStruct & co classes to support metaclass keyword arguments.

If someone is interested, I could try to make this implementation more elegant and perhaps close to how dataclasses are used.