wlav / cppyy

Other
391 stars 40 forks source link

Is it possible to define python classes and use them as C++ structs? #122

Closed jak6jak closed 1 year ago

jak6jak commented 1 year ago

So I'm creating a binding for an entity component system called flecs. Components in flecs are simple structs of variables only. For example if I were to use flecs in C++ it would look like this:

struct Position {
    double x, y;
};

struct Walking { };

int main(int, char *[]) {
    flecs::world ecs;

    // Create an entity with name Bob
    auto bob = ecs.entity("Bob")
        // The set operation finds or creates a component, and sets it.
        // Components are automatically registered with the world.
        .set<Position>({10, 20}) 
        // The add operation adds a component without setting a value. This is
        // useful for tags, or when adding a component with its default value.
        .add<Walking>();

    const Position* ptr = bob.get<Position>();
    std::cout << "{" << ptr->x << ", " << ptr->y << "}" << "\n";

I would like the people who use my binding to be able to create classes (possibly @dataclasses) in python and then call the set/add C++ functions to add them to the ECS. My first thought for implementing this was to use cppdef and somehow transform the python class into a C++ struct. That sounds potentially difficult does cppyy provide any methods that might make this easier?

wlav commented 1 year ago

That's a bit of an open ended question. :) The short answer is that cppyy does not provide anything directly helpful (except for derived classes, see below), and that fleshing this out would be a bit of a project.

First of, realize that a C struct and a Python object with data members are nothing alike: the memory layout is completely different. Python objects are also usually opaque, and can move memory around. E.g. there's nothing in a Python int that you could point to as being a C++ int (separate from such issues as the latter not being a portable type). Even if you could, you would not want that, b/c Python considers ints to be immutable. The ctypes module does allow you to create structs from Python (see the ctypes manual), but its interface is clunky exactly b/c it needs you to be precise. Then on the C side, if you want to access data members of these structs in the JIT, they have to be declared, otherwise the compiler can't calculate offsets (you can't, portably, do this yourself b/c of potential padding).

The first question then is what you want to do with them on the C++ side. If, as in the example above, you want to access the data members in JITed or compiled code, you are not going to get away from declaring the struct to the JIT. OTOH, if instances are only stored and retrieved to/from C++ (including being used in templated get/set methods), with all object manipulation in Python, all you need is a base class for these structs. By deriving from a C++ base class in Python, unique dispatcher classes are created on the fly, which is why they work well with templates. However, these do not expose the data members and the base class must have a virtual destructor for automatic memory management to work, so these won't be PODs and won't play nice with C (as opposed to C++) code.

The next question, if these classes are to be declared (most likely), is what types to support. Yes, you can use an annotator to inspect a Python class, but you are going to have to restrict the Python class to do the mapping to something reasonable. E.g. self.x = 1 seems to suggest that x is an int, but is it really? Could have been a float, too, or change to a float later if multiplied by pi in some Python code. Additionally, an int is an ambiguous type. So, most likely the interface is going to require clarity in Python. Again, see the ctypes manual above. Then, if that code is clear and unambiguous, generating a C struct declaration for consumption by cppdef() is actually rather trivial.

However, that's not the end of it: if this is done by annotating a Python class and having custom variables with the necessary type information, then that annotated class is not the cppyy proxy class. I.e., there will have to be some custom management if besides read access you also need write access to data members. (Of course, the annotator can deal with that, by not annotating the Python class, but only using it as a vessel for the needed info, and returning the proxy instead.)

The easiest then seems to be a factory function (yes, this can be the annotator) that takes in the precise description using some convention, then returns the proxy instead of the original Python class (if using an annotator). Since the proxy is a normal Python class, you can manipulate it further, e.g. by adding methods.

Anyway, here's some example code along the lines I'm thinking that could work:

import cppyy
import io

def structify(cls):
    name = cls.__name__

    code = io.StringIO()
    code.write("struct ")
    code.write(name)
    code.write(" {\n")
    for dmn, dmt in cls._fields.items():
        code.write(f"{dmt} {dmn};\n")
    code.write(" };\n")

    cppyy.cppdef(code.getvalue())
    code.close()

    proxy = getattr(cppyy.gbl, name)
    for n, m in cls.__dict__.items():
        if callable(m):
            setattr(proxy, n, m)

    return proxy

@structify
class MyData:
    _fields = {'x' : 'double', 'y' : 'double'}

    def norm(self):
        return (self.x**2 + self.y**2)**0.5

m = MyData(1, 2)
print(m.norm())
wlav commented 1 year ago

Closing as presumed clarified as no further response for 2 months. Feel free to repopen or start a new issue if that is not the case.