wlav / CPyCppyy

Other
23 stars 20 forks source link

make __doc__ of class members writable #1

Closed N-Coder closed 2 years ago

N-Coder commented 3 years ago

See my initial issue here. Regarding your comment:

I was kindof hoping to find a solution that doesn’t require adding another pointer CPPOverload::MethodInfo_t. Not sure whether that’s possible, though, but the whole structure could be reduced: there need not be a separate dispatch map as a loop over methods would be just as fast and fName is hardly used (only in error reporting and str), so could be on-demand instead.

...I took the easy path and did exactly that to CPPDataMember (where I could add the member field directly to the class), TemplateProxy, and CPPOverload (where I added the member field to the persistent CPPOverload::MethodInfo_t / TemplateInfo object and adapted the __doc__ getter of / added a setter to the enclosing object). By default, the getters return the auto-generated docstring, unless the user overwrote it with a custom value. I guess you need to store the string value somewhere, unless you want to give the objects their whole, customizable __dict__ with tp_dictoffset, but you'd also need to store that somewhere. I'm unsure whether something similar needs to be done to CustomInstanceMethod, TypedefPointerToClass, CPPInstance, and CPPExcInstance, as I didn't have a class at hand that wraps those. At least the current changes in this PR seem to cover my use-case from the Bitbucket issue.

Regarding the hook for CPPScope.__getattr__ discussed here, I guess I would also add a PyCallable member to CPPScope that gets called here in the native implementation of meta_getattro to provide a replacement value if no other alternative could be found.

wlav commented 3 years ago

Yes, storage somewhere is needed. The only alternative that I can think of, is to use an "extended" info, same as done for CPPInstance. Problem is how to update the outstanding pointers, otherwise both versions (with and without) would be around for a while. One solution would be to capture other little-used or on-demand constructable pieces (e.g. fName and fDescription) and put them together with fDoc in an on-demand extension. Note that if only one is ever used, the pointer to the extension can be the pointer to the current use. Only if both need constructing is an extension needed.

CPPInstance actually has a __dict__ and CPPExcInstance would be ephemeral enough not to need __doc__. In both cases, the better place would be in CPPScope. CustomInstanceMethod would likewise have the problem of being ephemeral (CPPOverload and TemplateProxy differ here: their innards are not only shared for reduced memory usage, but also for settings to survive if assigned on a temporary, e.g. bound, copy). TypedefPointerToClass is fine to grow a __dict__: it's an uncommon beast.

N-Coder commented 3 years ago

My use-case usually doesn't involve memory pressure, so I'd be happy with the simple solution, but I guess a more involved implementation can always be added later on. :sweat_smile:

Do you have an example where TypedefPointerToClass occurs? Then I could add and test that.

N-Coder commented 3 years ago

I just created a small test case and found some oddities (scroll up in the gist to see the output):

Should I try to fix those points, too?

wlav commented 3 years ago

Do you have an example where TypedefPointerToClass occurs? Then I could add and test that.

Here's an example (it's a common one for opaque handles):

import cppyy

cppyy.cppdef("""\
namespace doctest {

struct A{};

struct B {
    typedef A* A_t;
}; }""")

ns = cppyy.gbl.doctest

a_ptr = ns.B.A_t(0x2a)

assert cppyy.addressof(a_ptr) == 42

And yes, TemplateProxy is a mess, but that's b/c it can grow (as more methods are instantiated) and it also represents non-templated overloads, specializations, etc. None of these need to have the same signature. And yes __repr__ calls tpp_doc. Same reason: not much better to print in repr() as there isn't really a reconstructable state.

N-Coder commented 3 years ago

Okay, I implemented that and updated the testing example. There are two things I'd still like to see fixed, afterwards this PR should be good to go. Have a look at the example __dict__:

<class cppyy.gbl.Test at 0x55b79cb89e70> {'__module__': '__main__',
    '__dict__': <attribute '__dict__' of 'Test' objects>,
    '__weakref__': <attribute '__weakref__' of 'Test' objects>,
    '__doc__': None,
    'templ': cppyy template proxy (internal),
    '__assign__': <cppyy.CPPOverload object at 0x7f7673be1580>,
    '__init__': <cppyy.CPPOverload object at 0x7f7673bda840>,
    'method': <cppyy.CPPOverload object at 0x7f7673bda8c0>,
    'overload': <cppyy.CPPOverload object at 0x7f7673bdd800>,
    'field': <cppyy.CPPDataMember object at 0x7f7673bdd830>}

Here, all reprs say <cppyy.Something object at 0x123456789abcdef>, except for TemplateProxy, which prints its __doc__. I'd unify this so that TemplateProxy uses the same repr as all the other objects.

As a second point, field is listed in the __dict__ output, but calling cppyy.gbl.Test.field will throw a not-found exception. To access the attribute, you need to use cppyy.gbl.Test.__dict__["field"], which is weird. I'm not quite sure how to fix this, but it's probably somewhere in meta_getattro.

N-Coder commented 2 years ago

The TemplateProxy.__doc__ issue should be fixed now. I couldn't find an easy fix for making the descriptor Test.field accessible, the full error for that is:

<class cppyy.gbl.Test at 0x55e9b851c020> has no attribute 'field'. Full details:
  attribute access requires an instance
  'Test::field' is not a known C++ class
  'field' is not a known C++ template
  'field' is not a known C++ enum

The reason for this is CPPDataMember::GetAddress returning a nullpointer when no object is given, the stack trace in this case is:

CPyCppyy::CPPDataMember::GetAddress CPPDataMember.cxx:336
CPyCppyy::pp_get CPPDataMember.cxx:51
type_getattro typeobject.c:3380
CPyCppyy::meta_getattro CPPScope.cxx:309
cfunction_vectorcall_FASTCALL methodobject.c:430

Unfortunately, I don't understand the logic here. @wlav could you change it so that the following code works? Otherwise the PR should be ready.

import cppyy
cppyy.cppdef("""
class Test {
public:
    std::vector<int> field;
};
""")
assert cppyy.gbl.Test.field is cppyy.gbl.Test.__dict__["field"]
wlav commented 2 years ago

Unfortunately, I don't understand the logic here. @wlav could you change it so that the following code works?

Yes, I actually ran into the same issue when trying to add a property to fields to set the encoding of char* and std::string fields. Other than removing that check, I saw no solution. The check is useful in principle in that it prevents lookups of instance data members on the class (there is a test for that), but I note that there is no such equivalent check of looking up static data members on the metaclass.

Since the other code isn't remotely finished (lots of other changes to converters, which need to know the encoding in the end), I recommend to simply remove the check for now to make progress. I'll figure out my local changes and the changes here later.

N-Coder commented 2 years ago

I guess removing the check won't fix things easily, as the following code would then have troubles with the object instance being null. For me, if you plan on cleaning that up some time later that's just as well, as the workaround via the __dict__ works in the meantime and I just wanted to make sure that the behaviour is consistent in the end (i.e. whenever you get to merge your converter changes).

From my side, these changes should then be good to go now. But you might want to give them another closer look as I'm not 100% confident I did the CPython refcounting right.

Another thing we might want to think about next to the __doc__ stuff are __annotations__ used for type checking. It should be easy to know the types coming from C++ and this would be a nice mechanism to expose them to type checkers and IDEs. But maybe I should open another issue for this...

wlav commented 2 years ago

From my side, these changes should then be good to go now. But you might want to give them another closer look as I'm not 100% confident I did the CPython refcounting right.

will do as the week opens up later, now that the big conference is done.

As for __annotations__ ... I tried, but couldn't figure out a good way to handle overloads. There are two issues that I could not find a solution for:

  1. Prevent presenting the combinations of arguments as acceptable signatures (e.g. double, double and int, int does not mean that double, int is allowed)
  2. Overloads with a different number of arguments (e.g. double v.s. double, double; really a variation of 1., but actually worse if an IDE would insist on an extra parameter).

Any idea how to tackle these?

As-is, the only place where __annotations__ is used, is in template resolution. If a templated function takes a templated function (std::function or pointer) as argument, e.g. like so:

template<typename R, typename... U, typename... A>
R callT(R (*f)(U...), A&&... args) {
    return f(args...);
}

and the actual argument given is a Python function, e.g. like so:

def ann_f2(arg1: 'int', arg2: 'int') -> 'int':
    return 3*arg1*arg2
cppyy.gbl.callT(ann_f2, 42, 37)

then the types from the __annotations__ are used to instantiate the template. (Although it would be theoretically possible to figure out from the function body that the arg pack A is equal to U, there is truly no way to derive the return type R otherwise.)

N-Coder commented 2 years ago

Python unfortunately has no built-in support for function overloading. There is only a @singledispatch decorator that differentiates based on the type of the first argument, but this decorator doesn't provide any __annotations__ or other runtime typing information about the overloaded implementations. The @overload decorator can unfortunately only be used in type stub files to tell type-checkers using non-__annotations__ based analysis (i.e. directly parsing the stub file, not via live runtime object inspection) that a function accepts different pairs of argument types. I also couldn't find any other example where this kind of information was exposed during runtime via __annotations__. So I see two (mutually non-exclusive) ways of going forward:

But maybe this is something for another issue and we should try to get the __doc__ changes in first.

wlav commented 2 years ago

Yes, I'm thinking the same: the only problem that my pop up is confusion, so perhaps limiting "best effort" to cases that are clear maybe enough.

I'm going to merge this PR later today and fix up the remaining issue (data members) directly in the code. I find that easier than discussing on a PR. :) I'm really hoping to get my Windows VMWare up and running again so that i can cut a release (last one was from July ...).