pybind / pybind11

Seamless operability between C++11 and Python
https://pybind11.readthedocs.io/
Other
15.57k stars 2.09k forks source link

[smart_holder] Unique ptr deleter roundtrip tests and fix #4921

Closed iwanders closed 10 months ago

iwanders commented 11 months ago

Description

@rwgk , as promised; followup to #4850, this PR ensures custom deleters are propagated correctly and keep their state.

I only saw the note about conventional commits when I filed this PR, so their descriptions aren't that good, the work is mostly split out though in individual commits;

Unit tests make check all pass, I ran clang format 13 on the files I touched, but didn't see a make format or anything along those lines, so it wouldn't surprise me if some linters still fail.

Happy to iterate, but feel free to push commits to this branch as you see fit. Filing as draft first before involving other code owners.

Somewhat related observation

One thing that stood out to me was this line; https://github.com/pybind/pybind11/blob/edfaaed87ea3a869eaa1947f3d40cb62958295ca/include/pybind11/detail/smart_holder_poc.h#L106

I feel that shouldn't be doing a delete, but instead do:

   std::default_delete<T>{}(raw_ptr);

Because the trait system that std::default_delete provides can be considered an extension point of the standard library, and calling delete would bypass any specializations that are provided.

Suggested changelog entry:

Custom deleters in unique_ptrs are stored and extracted correctly from the smart holder.
rwgk commented 11 months ago

@iwanders wrote (in the PR description):


One thing that stood out to me was this line; https://github.com/pybind/pybind11/blob/edfaaed87ea3a869eaa1947f3d40cb62958295ca/include/pybind11/detail/smart_holder_poc.h#L106

I feel that shouldn't be doing a delete, but instead do:

   std::default_delete<T>{}(raw_ptr);

I agree. I didn't think deeply at all about that line.

Do you want to fix that in a separate small PR on the side?

iwanders commented 11 months ago

Oh, oops, I forgot std::make_unique is c++14 and not 11 :grimacing: let me fix that.

Edit, should be fixed with 25c2e40289776494a93906d52e845a206eed292a

iwanders commented 11 months ago

Previous CI failed on c++20, the explicit string constructor is a requirement if we remove the default constructor, otherwise we can't instantiate the unique pointer in the unit test. Quick minimal example with compile error.

And force pushed again to fix clang tidy with f2f87f0cc143be54060ae670c3ef2e073e621191, :crossed_fingers: should be in the clear now...

rwgk commented 11 months ago

The latest commits all look good to me, thanks!

I'm not surprised by the test failures, most likely they're very easy to fix, like this:

https://github.com/pybind/pybind11/blob/e02fe001cdb9bc252efdfe8f08486de09e369737/tests/test_class_sh_basic.py#L113

rwgk commented 11 months ago

I'll run some internal testing with this PR as is. My suggestions don't change anything substantial. Thanks!

Unit testing with all sanitizers in the Google toolchain passes. (as expected)

Global testing just changing pybind11 passes. (as expected; there are only a handful of smart_holder uses this way)

None of the remaining PyCLIF-pybind11 global test failures is fixed by this PR. (I had some wishful thinking some may be fixed.) — This was a "rerun" of global testing, after adding in this PR.

Fresh global testing — with this PR included from the start — with PyCLIF-pybind11 exercises the smart_holder functionality in vastly more situations, but it is very expensive. I'll do that next weekend at the latest. (It might need a little bit of trickery to determine for certain if this PR fixed latent bugs.)

iwanders commented 11 months ago

I'm not surprised by the test failures, most likely they're very easy to fix, like this:

Awesome, thanks for that suggestion, I was surprised tbh, I would've expected the amount of move constructors to be equal on between the compilers, but guess there's still some variation in how they implement the language. Pipeline is running with that suggested fix now :crossed_fingers:

None of the remaining PyCLIF-pybind11 global test failures is fixed by this PR. (I had some wishful thinking some may be fixed.) — This was a "rerun" of global testing, after adding in this PR.

Heh, well, I don't think too many people use a unique pointer with a custom deleter, so not too surprised. Regardless, it's a good improvement and I had fun doing some C++ for a good cause :)

iwanders commented 11 months ago

This looks like a flakey test, at least, we didn't see this error before; https://github.com/pybind/pybind11/actions/runs/6791647051/job/18463525701?pr=4921

         FAILED test_iostream.py::test_flush - AssertionError: assert '### TestFact...(not flushed)' == '(not flushed)'
           + ### TestFactory2 @ 0x227bbf9d0a0 destroyed
             (not flushed)
         = 1 failed, 760 passed, 50 skipped, 26 xfailed, 7 xpassed in 200.07s (0:03:20) =

Going to incorporate the clang tidy noexcept suggestion, but thought I'd point it out in case it doesn't occur again.

rwgk commented 11 months ago

That's a very common flake.

On Tue, Nov 7, 2023 at 16:07 Ivor Wanders @.***> wrote:

This looks like a flakey test, at least, we didn't see this error before; https://github.com/pybind/pybind11/actions/runs/6791647051/job/18463525701?pr=4921

     FAILED test_iostream.py::test_flush - AssertionError: assert '### TestFact...(not flushed)' == '(not flushed)'
       + ### TestFactory2 @ 0x227bbf9d0a0 destroyed
         (not flushed)
     = 1 failed, 760 passed, 50 skipped, 26 xfailed, 7 xpassed in 200.07s (0:03:20) =

Going to incorporate the clang tidy noexcept suggestion, but thought I'd point it out in case it doesn't occur again.

— Reply to this email directly, view it on GitHub https://github.com/pybind/pybind11/pull/4921#issuecomment-1800604712, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFUZAG56L3YVYRZNOC6XP3YDLEMXAVCNFSM6AAAAAA66OFMGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBQGYYDINZRGI . You are receiving this because you were mentioned.Message ID: @.***>

rwgk commented 11 months ago

Looks like it's working now (121 successful already, no failures).

Could you do me a small favor and click update branch here, or git merge smart_holder, git push interactively? (If I click the button here it'll mark me a collaborator just for that. I think it's better if it's clear that it's all your work.)

iwanders commented 11 months ago

Done, I've also removed the 'draft' label, good to go in from my side, but if you want to wait until your big weekend run of tests that's fine by me of course, no rush.

rwgk commented 11 months ago

Done, I've also removed the 'draft' label, good to go in from my side, but if you want to wait until your big weekend run of tests that's fine by me of course, no rush.

Thanks for the great work!

I'll wait for the GHA to finish, then merge. For the weekend run I'd want to include this PR from the start anyway. Big maybe: if I see a reason to back it out for a rerun (to see if that resolves any failures), I can do that easily with a local revert. In the worst case (very much unexpected) we can revert here, or fix forward.

iwanders commented 10 months ago

Please do let me know if any of the test runs uncover any issues, happy to help tackle them if related to this piece of work.

rwgk commented 10 months ago

For the weekend run I'd want to include this PR from the start anyway.

Done. No surprises.

Big maybe: if I see a reason to back it out for a rerun (to see if that resolves any failures), I can do that easily with a local revert.

I didn't really see a reason, but for completeness, and because it was easy, I tested with the local revert anyway. Very clearly: this PR did not break anything.

In summary (see https://github.com/pybind/pybind11/pull/4921#issuecomment-1799536480): This PR did not fix anything in the wild, but also did not break anything.

I'm very thankful though that this bug was fixed, so that we no longer have it hanging over our heads as an uncertainty.

iwanders commented 10 months ago

I didn't really see a reason, but for completeness, and because it was easy, I tested with the local revert anyway. Very clearly: this PR did not break anything.

Thanks for letting me know, glad we didn't discover any new bugs.

I'm very thankful though that this bug was fixed, so that we no longer have it hanging over our heads as an uncertainty.

Agreed, this is one of those 'would lead to an annoying bug in production' type of things.