Closed rwgk closed 10 months ago
Logging a very surprising observation:
python3 -m pytest -n 2 test_python_multiple_inheritance.py
.- assert d.get_base_value() == 11
+ assert d.get_base_value() == 12
assert d.get_base_value_from_drvd() == 12
d.reset_base_value(20)
assert d.get_base_value() == 20
- assert d.get_base_value_from_drvd() == 12
+ assert d.get_base_value_from_drvd() == 20
d.reset_base_value_from_drvd(30)
- assert d.get_base_value() == 20
+ assert d.get_base_value() == 30
pytest -n 2
:
( cd /usr/local/google/home/rwgk/forked/pybind11/tests && PYTHONPATH=/usr/local/google/home/rwgk/forked/build_gcc/lib /usr/bin/python3 -m pytest -n 2 test_python_multiple_inheritance.py )
=========================================================== test session starts ============================================================
platform linux -- Python 3.11.4, pytest-7.2.1, pluggy-1.0.0+repack
C++ Info: 12.2.0 C++17 __pybind11_internals_v4_gcc_libstdcpp_cxxabi1017__ PYBIND11_SIMPLE_GIL_MANAGEMENT=False
rootdir: /usr/local/google/home/rwgk/forked/pybind11/tests, configfile: pytest.ini
plugins: xdist-3.3.1
2 workers [2 items]
.. [100%]
============================================================ 2 passed in 0.32s =============================================================
pytest -n 1
:
( cd /usr/local/google/home/rwgk/forked/pybind11/tests && PYTHONPATH=/usr/local/google/home/rwgk/forked/build_gcc/lib /usr/bin/python3 -m pytest -n 1 test_python_multiple_inheritance.py )
=========================================================== test session starts ============================================================
platform linux -- Python 3.11.4, pytest-7.2.1, pluggy-1.0.0+repack
C++ Info: 12.2.0 C++17 __pybind11_internals_v4_gcc_libstdcpp_cxxabi1017__ PYBIND11_SIMPLE_GIL_MANAGEMENT=False
rootdir: /usr/local/google/home/rwgk/forked/pybind11/tests, configfile: pytest.ini
plugins: xdist-3.3.1
1 worker [2 items]
.F [100%]
================================================================= FAILURES =================================================================
______________________________________________________________ test_PPCCInit _______________________________________________________________
[gw0] linux -- Python 3.11.4 /usr/bin/python3
def test_PPCCInit():
d = PPCCInit(11)
assert d.get_drvd_value() == 36
d.reset_drvd_value(55)
assert d.get_drvd_value() == 55
# CppBase is initialized and used when CppBase methods are called, but
# CppDrvd is used when CppDrvd methods are called.
> assert d.get_base_value() == 12
E assert 11 == 12
E + where 11 = <bound method PyCapsule.get_base_value of <test_python_multiple_inheritance.PPCCInit object at 0x7f3cfad79490>>()
E + where <bound method PyCapsule.get_base_value of <test_python_multiple_inheritance.PPCCInit object at 0x7f3cfad79490>> = <test_python_multiple_inheritance.PPCCInit object at 0x7f3cfad79490>.get_base_value
d = <test_python_multiple_inheritance.PPCCInit object at 0x7f3cfad79490>
test_python_multiple_inheritance.py:32: AssertionError
========================================================= short test summary info ==========================================================
FAILED test_python_multiple_inheritance.py::test_PPCCInit - assert 11 == 12
======================================================= 1 failed, 1 passed in 0.34s ========================================================
ERROR: completed_process.returncode=1
Follow-on to the previous comment:
The test also passes with pytest -n 1
(or just pytest
) if I move def test_PC():
after def test_PPCCInit():
.
The source of the order dependence is the algorithm here:
With Python code in this order
PC(0)
PPCC(0)
the result of the algorithm is:
registered_types_py = {PC: {CppBase}, PPCC: {CppBase, CppDrvd}}
With Python code in this order
PPCC(0)
PC(0)
the result of the algorithm is:
registered_types_py = {PPCC: {CppDrvd, CppBase}, PC: {CppBase}}
For completeness, code to reproduce: d37b5409d4e5b835620ccbb321a4e1ba89af315c
Note that the type of registered_types_py
is std::unordered_map<PyTypeObject *, std::vector<type_info *>>
:
I.e. the order of the keys is undetermined, which is totally fine, it doesn't matter.
However, the order of the bases for a given key is completely deterministic in both cases above.
This is a subtle but very serious flaw:
Assume some complex user code currently works as expected. It happens to call PPCC(i)
first, PC(j)
second. Then one day someone makes a harmless looking addition somewhere so that the order becomes PC(k)
, PPCC(i)
, PC(j)
. All the sudden the existing code behaves differently. This will be extremely difficult to troubleshoot.
Tracking result of a simple experiment with the diff below.
Google global testing did not find any use cases involving manually written pybind11 extensions tripping over the added exceptions.
Internal test ID: OCL:554689491:BASE:555172881:1691595490960:83406d90
Note that this testing did NOT use PyCLIF-pybind11 (auto-generated pybind11 extensions), which brings in at least one use case, which led to this PR. The number of auto-generated pybind11 extensions is roughly 5x the number of manually written extensions.
--- a/include/pybind11/detail/type_caster_base.h
+++ b/include/pybind11/detail/type_caster_base.h
@@ -109,14 +109,22 @@ inline void all_type_info_add_base_most_
type_info *existing_base = *it;
if (PyType_IsSubtype(addl_base->type, existing_base->type) != 0) {
bases.insert(it, addl_base);
+ throw std::runtime_error("all_type_info_add_base_most_derived_first:BANDAID");
return;
}
}
bases.push_back(addl_base);
+ for (std::size_t i = 0; i < bases.size(); i++) {
+ for (std::size_t j = i + 1; j < bases.size(); j++) {
+ if (PyType_IsSubtype(bases[i]->type, bases[j]->type) != 0) {
+ throw std::runtime_error("all_type_info_add_base_most_derived_first:LUCKY");
+ }
+ }
+ }
}
// Populates a just-created cache entry.
-PYBIND11_NOINLINE void all_type_info_populate(PyTypeObject *t, std::vector<type_info *> &bases) {
+PYBIND11_NOINLINE void all_type_info_populate(PyTypeObject *t, std::vector<type_info *> &bases) noexcept {
assert(bases.empty());
std::vector<PyTypeObject *> check;
for (handle parent : reinterpret_borrow<tuple>(t->tp_bases)) {
Hi @jagerman, is there a chance that you could help review this PR? — @wjakob said this is code that you rewrote at some point.
Interesting diamand case.
struct CppBase {int base;};
struct CppDrvd1 : CppBase {};
struct CppDrvd2 : CppBase {};
class PCB(m.CppBase):
pass
class PC1(m.CppDrvd1):
pass
class PC2(m.CppDrvd2):
pass
class PCD(PC1, PC2): # consider banning this.
pass
PC2().set_base
py::class<CppBase>().method('set_base')
py::class<CppDrvd1, CppBase>().method('get_base')
py::class<CppDrvd2, CppBase>().method('get_base')
@rainwoodman FYI
After our meeting reviewing this PR I decided it's worth the effort catching all ill-behaved situations (not just fix the particular situation I stumbled over).
0a9599f775bfd3ca196c5e23a3fcf2890cbf6e82 captures some of them, but not all.
To get all, we need to bring in the C++ bases as well. I'll try to work on that asap.
Intermediate results: I ran Google global testing with 0a9599f775bfd3ca196c5e23a3fcf2890cbf6e82:
just changing pybind11: the "diverging" error message did not appear anywhere.
PyCLIF-pybind11: again, he "diverging" error message did not appear anywhere.
So plugging this hole could turn out to be relatively high effort for very little practical gain, but we can only be sure after putting in the effort. :-/
For my own reference:
Global testing just changing pybind11: OCL:579584090:BASE:579672844:1699221331111:c625089a
Global testing changing PyCLIF-pybind11: OCL:579673181:BASE:579726990:1699249978254:6ab60a42
Regarding 5f5fd6a68e3cff1726628f6dea8e1c0754636a23:
This bug goes pretty deep: registered_types_py
does not enumerate all C++ bases for pybind11-wrapped types. Therefore I had to go into type_caster_generic::load_impl<>
to detect divergent bases.
The current state of this PR is definitely not meant to be for merging. It's only meant to enable global testing, to see if there are hidden problems in the wild. For that I have to move this work to the pywrapcc branch/repo, because I have to apply a similar patch in smart_holder_type_casters.h.
I have to apply a similar patch in smart_holder_type_casters.h.
That's under https://github.com/google/pywrapcc/pull/30077 now.
Unit testing with all sanitizers in the Google toolchain passes.
Global testing initiated. Results will probably only be available very late today.
Note: the divergence check in type_caster_generic::load_impl<>
only catches problems when a Python object is passed for a C++ base type in a function call. It would be much better to catch all divergence issues in all_type_info_populate()
, but currently I don't see a way to achieve that easily.
Global testing initiated. Results will probably only be available very late today.
No failures.
It would be much better to catch all divergence issues in
all_type_info_populate()
, but currently I don't see a way to achieve that easily.
That's the only situation we're still not sure about.
Conclusion from a text chat with @rainwoodman:
I will revert the last 3 commits here (not counting merge commits), to get back to the state reviewed in a meeting with @rainwoodman last week (Oct 31), then merge.
But also: Create a new PR bringing back the 3 commits (0a9599f775bfd3ca196c5e23a3fcf2890cbf6e82, 5f5fd6a68e3cff1726628f6dea8e1c0754636a23, df27188dc6d6cf333145755543f56b2f6657aa5e), and do follow-on work separately, later.
Rationale: I think it's fair to defer work on catching "diverging" (best term I could think of) inheritance situations because I could not find any problems in the wild (Google global testing). The new PR will serve as documentation and reminder.
I agree with your assessment: we can revisit the divergence case when there is evidence that someone actually wants to use it.
The follow-on PR is #4928.
Description
The equivalent of this PR was merged as https://github.com/google/pywrapcc/pull/30056 on September 12, 2023.
COPY of the google/pywrapcc#30056 PR description (for easy reference):
The core of the added test is this:
Without this PR, the runtime behavior depends on the order in which
PC
orPPCC
are first instantiated, which can be highly surprising and potentially very difficult to debug. See https://github.com/pybind/pybind11/pull/4762#issuecomment-1654558387 for a full analysis.However, use cases appear to be surprisingly rare (https://github.com/pybind/pybind11/pull/4762#issuecomment-1671731319). The only known use cases are through PyCLIF.
More-or-less as a side-effect, this PR also:
__init__
unnecessarilty.reinterpret_cast
here: https://github.com/pybind/pybind11/blob/8c7b8dd0ae74b36b7d42f77b0dd4096ebb7f4ab1/include/pybind11/detail/class.h#L193Original PR description:
The core of the test is this:
pybind11 forces adding an
__init__
override (which happens to not disturb PyCLIF or Boost.Python).A wrapped
PPPC
instance evidently has two independent C++ objects (CppBase
andCppDerived
) with pybind11, while it has only one with PyCLIF and Boost.Python (CppDerived
).The nanobind metaclass refuses to create the
PPCC
subtype.Suggested changelog entry: