wlav / cppyy

Other
384 stars 38 forks source link

Statefulness of converters is not considered over multiple function calls #229

Open guitargeek opened 2 months ago

guitargeek commented 2 months ago

Reproducer:

import cppyy

s1 = cppyy.gbl.std.string(b"s1")
s2 = cppyy.gbl.std.string(b"s2")

v1 = cppyy.gbl.std.string_view(s1)
print(v1)

cppyy.gbl.std.string_view(s2)
print(v1)

Output:

s1
s2

The problem is not the string converter in particular, but the converter logic in general.

Even though the converters proclaim that they have state with HasState(), cppyy doesn't respect that over multiple function calls. The string converter used for the first std::string_view is exactly the same object as for the second string view.

@wlav do you have any implementation ideas to fix this? PyROOT is completely broken if this doesn't work.

My first idea would be to hide the statefulness of the converters by having them manage resources themselves via static data members. That should be no problem because of the GIL.

We could also think about moving the state from the converter to the instance maybe? It's a bit weird anyway that some objects are defined by the state of their converters, even after the conversion has already happened, as in the reproducer.

See this PR, where I suggested a unit test with the code that originally hinted to the problem: https://github.com/wlav/cppyy/pull/228

wlav commented 2 months ago

The problem is not the string converter in particular, but the converter logic in general.

No, this one is particular to this problem: the first PR copied the std::string argument passed in into the fStringBuffer. It should copy.

But otherwise yes, the logic is that the argument buffer doesn't live longer than the function call itself, which is reasonable: a buffer is normally only used if a Python object needs conversion (that's what was done incorrectly here: a C++ object is being preserved). There's no reasonable expectation if the C++ function keeps pointers to a product that is, in effect, a temporary variable as it must be if there is no C++ representation.

wlav commented 2 months ago

Fixed in repo by not using the buffer technique, but the (slightly more convoluted) alternative we discussed at the time of putting a life line on the Python str object.