pybind / pybind11

Seamless operability between C++11 and Python
https://pybind11.readthedocs.io/
Other
15.82k stars 2.12k forks source link

Address regression introduced in #5381 #5396

Closed francesco-ballarin closed 1 month ago

francesco-ballarin commented 1 month ago

Description

This PR fixes a regression introduced in #5381 that shows up (at least) when the pybind11 wrapped type is a pointer to a forward declared class. In that case checking for std::is_base_of<> as in #5381 results in a compilation error because the forward declaration doesn't have enough information to determine the inheritance diagram.

  1. Originally reported at https://github.com/pybind/pybind11/pull/5381#issuecomment-2394997306.
  2. Technical discussion at https://github.com/pybind/pybind11/pull/5381#issuecomment-2395135244 and https://github.com/pybind/pybind11/pull/5381#issuecomment-2395138685
  3. Further technical discussion in the comments below.

Suggested changelog entry:

A regression introduced by #5381 was fixed.
francesco-ballarin commented 1 month ago

@rwgk let's move the rest of the discussion here.

I implemented the patch we discussed in https://github.com/pybind/pybind11/pull/5381#issuecomment-2395135244 and https://github.com/pybind/pybind11/pull/5381#issuecomment-2395138685 but that still does not fix the regression on my actual use case, hence this PR is still marked as draft.

I think that both of us were expecting any_of<std::is_same<...>, std::is_base_of<...>> to first check if std::is_same<...> was true and, if it was, don't bother checking std::is_base_of<...>. Unfortunately it seems that each of the any_of<...> arguments is evaluated first, and then the or operation happens. This means that even with this PR my actual use case results in a compilation error. Is there anything similar to any_of that stops on the first evaluation to true?

I'll post a link to this PR in the downstream project, but I can't see a way for us downstream to work around the compilation error if the regression isn't sorted. The forward declared variable is a pointer to a class which is part of the private interface of a third-party library, over which we have no control.

rwgk commented 1 month ago

The forward declared variable is a pointer to a class which is part of the private interface of a third-party library

Sigh again. I've stumbled over that pattern several times before.

Fundamentalistically speaking, if it's meant to be private, it shouldn't be exposed to Python.

More pragmatically speaking: This really is in the gray area of what you can expect a bindings system to support. It just so happened that it used to work. Corner case.

To restore supporting this corner case, I think you need to experiment. I'd bet you can make it work again, but not sure, and not sure how.

I'd start with std::is_same<...>::value || std::is_base_of<...>::value as the basic idea.

It looks a little tricky to fit that into the existing machinery.

Alternatively, I'd look into changing the bindings code, to avoid exposing the private C++ type to Python.

francesco-ballarin commented 1 month ago

Fundamentalistically speaking, if it's meant to be private, it shouldn't be exposed to Python.

Not really. The class itself is private, but the pointer to the class is public. It will be exposed to Python by someone else, in a way we have no control over.

Hence

Alternatively, I'd look into changing the bindings code, to avoid exposing the private C++ type to Python

is a no go.

I'd start with std::is_same<...>::value || std::is_base_of<...>::value as the basic idea.

Will try.

gentlegiantJGC commented 1 month ago

@francesco-ballarin can you help me understand the problem better. The example below works so I must be misunderstanding something but it is a minimal starting point.

#include <iostream>

// Start pybind11 intrinsic_type
/// Helper template to strip away type modifiers
template <typename T>
struct intrinsic_type {
    using type = T;
};
template <typename T>
struct intrinsic_type<const T> {
    using type = typename intrinsic_type<T>::type;
};
template <typename T>
struct intrinsic_type<T *> {
    using type = typename intrinsic_type<T>::type;
};
template <typename T>
struct intrinsic_type<T &> {
    using type = typename intrinsic_type<T>::type;
};
template <typename T>
struct intrinsic_type<T &&> {
    using type = typename intrinsic_type<T>::type;
};
template <typename T, size_t N>
struct intrinsic_type<const T[N]> {
    using type = typename intrinsic_type<T>::type;
};
template <typename T, size_t N>
struct intrinsic_type<T[N]> {
    using type = typename intrinsic_type<T>::type;
};
template <typename T>
using intrinsic_t = typename intrinsic_type<T>::type;
// End pybind11 intrinsic_type

class Test1;
typedef struct Test1* Test2;

template <typename Arg>
using f1 = std::is_same<Test1, intrinsic_t<Arg>>;

template <typename Arg>
using f2 = std::is_base_of<Test1, intrinsic_t<Arg>>;

int main() {
    std::cout << f1<Test2>::value << std::endl;
    std::cout << f2<Test2>::value << std::endl;

    return 0;
}
francesco-ballarin commented 1 month ago

Thanks @gentlegiantJGC for looking into this. I'd say that the following reproduces the issue starting from your minimal example. Note how I had to introduce the class args.

#include <iostream>

// Start pybind11 intrinsic_type
/// Helper template to strip away type modifiers
template <typename T>
struct intrinsic_type {
    using type = T;
};
template <typename T>
struct intrinsic_type<const T> {
    using type = typename intrinsic_type<T>::type;
};
template <typename T>
struct intrinsic_type<T *> {
    using type = typename intrinsic_type<T>::type;
};
template <typename T>
struct intrinsic_type<T &> {
    using type = typename intrinsic_type<T>::type;
};
template <typename T>
struct intrinsic_type<T &&> {
    using type = typename intrinsic_type<T>::type;
};
template <typename T, size_t N>
struct intrinsic_type<const T[N]> {
    using type = typename intrinsic_type<T>::type;
};
template <typename T, size_t N>
struct intrinsic_type<T[N]> {
    using type = typename intrinsic_type<T>::type;
};
template <typename T>
using intrinsic_t = typename intrinsic_type<T>::type;
// End pybind11 intrinsic_type

class Test1;
typedef struct Test1* Test2;

class args {};

template <typename Arg>
using f1 = std::is_same<args, intrinsic_t<Arg>>;

template <typename Arg>
using f2 = std::is_base_of<args, intrinsic_t<Arg>>;

int main() {
    std::cout << f1<Test2>::value << std::endl;
    std::cout << f2<Test2>::value << std::endl;

    return 0;
}

fails with

$ g++ -std=c++20 main.cpp -o main
In file included from /usr/include/c++/14/bits/move.h:37,
                 from /usr/include/c++/14/bits/exception_ptr.h:41,
                 from /usr/include/c++/14/exception:166,
                 from /usr/include/c++/14/ios:41,
                 from /usr/include/c++/14/ostream:40,
                 from /usr/include/c++/14/iostream:41,
                 from main.cpp:1:
/usr/include/c++/14/type_traits: In instantiation of ‘struct std::is_base_of<args, Test1>’:
main.cpp:50:27:   required from here
   50 |     std::cout << f2<Test2>::value << std::endl;
      |                           ^~
/usr/include/c++/14/type_traits:1493:30: error: invalid use of incomplete type ‘struct Test1’
 1493 |     : public __bool_constant<__is_base_of(_Base, _Derived)>
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
main.cpp:37:7: note: forward declaration of ‘struct Test1’
   37 | class Test1;
      |       ^~~~~
main.cpp: In function ‘int main()’:
main.cpp:50:29: error: ‘value’ is not a member of ‘f2<Test1*>’ {aka ‘std::is_base_of<args, Test1>’}
   50 |     std::cout << f2<Test2>::value << std::endl;
      |                             ^~~~~
gentlegiantJGC commented 1 month ago

As I think you discovered in the is_base_of docs it says "Derived should be a complete type; otherwise the behavior is undefined." I think that is the issue we are hitting.

From a quick google there are workarounds to tell if an argument is complete but they aren't very nice.

gentlegiantJGC commented 1 month ago

I have a solution. Perhaps you can improve upon it

#include <iostream>
#include <concepts>

template <typename, typename = void>
struct is_complete : std::false_type {};

template <typename T>
struct is_complete<T, decltype(void(sizeof(T)))> : std::true_type {};

class args {};

class ForwardClass;
class Class{};
class Args : public args {};

template<typename Base, typename Derived>
constexpr bool safe_is_base_of(){
    if constexpr (is_complete<Derived>::value){
        return std::is_base_of<Base, Derived>::value;
    } else {
        return false;
    }
}

template <typename Arg>
using argument_is_args = std::conditional_t<
    safe_is_base_of<args, Arg>(),
    std::true_type,
    std::false_type>;

int main() {
    std::cout << argument_is_args<ForwardClass>() << std::endl;
    std::cout << argument_is_args<Class>() << std::endl;
    std::cout << argument_is_args<Args>() << std::endl;

    return 0;
}

I initially tried return is_complete<Derived>::value && std::is_base_of<Base, Derived>::value; but it seems to expand both templates even if the first is false. Putting it inside if constexpr stops it expanding the is_base_of.

Edit: is_complete was based on this https://devblogs.microsoft.com/oldnewthing/20190710-00/?p=102678

francesco-ballarin commented 1 month ago

Looks great :heart: . I've invited you to my fork, so that you can push to the branch associated with this PR if you wish. Once we iron out a fix I'll try it on the actual end user library.

gentlegiantJGC commented 1 month ago

Erm okay. Apparently if constexpr requires C++ 17.

gentlegiantJGC commented 1 month ago

I think that change might work. I don't understand what decltype(void(sizeof(Derived))) does other than disallowing evaluation of the second template struct. If it is a forward declaration it should use the first case and if it is a complete class it should use the second. We will have to let the tests run to see.

francesco-ballarin commented 1 month ago

Thanks @gentlegiantJGC , I can now confirm that the change in this branch fixes the downstream compilation issue in the end user library.

@rwgk this is now ready for your review.

gentlegiantJGC commented 1 month ago

@francesco-ballarin do you want to write some test cases for this to make sure it doesn't get broken again in the future?

rwgk commented 1 month ago

@francesco-ballarin do you want to write some test cases for this to make sure it doesn't get broken again in the future?

That would be best, and I now believe it could be pretty simple, basically just a unit test (static_assert()) for is_same_or_base_of, in tests/test_kwargs_and_defaults.cpp. All we want to establish is that it compiles for an incomplete Derived. There doesn't need to be a runtime test.

gentlegiantJGC commented 1 month ago

Should there also be a test constructing a pybind class around a pointer to an incomplete type? That way it would catch the issue if it cropped up somewhere else.

rwgk commented 1 month ago

Should there also be a test constructing a pybind class around a pointer to an incomplete type? That way it would catch the issue if it cropped up somewhere else.

I'd say: Great idea! But up to you.

If it works without any further production code changes: fine to include in this PR.

Otherwise I think it'll be better to merge this PR with a minimal new test, and start a new PR for what you have in mind.

francesco-ballarin commented 1 month ago

@gentlegiantJGC I agree it would be best to have a test that catches this, but I won't be capable of preparing one myself ;)

gentlegiantJGC commented 1 month ago

Looks like most of the tests are passing. I don't understand the tests that are failing. Are these due to the py::class_ wrapping a pointer? What can I do to resolve them?

rwgk commented 1 month ago

Looks like most of the tests are passing. I don't understand the tests that are failing. Are these due to the py::class_ wrapping a pointer? What can I do to resolve them?

My first reaction: py::class_-wrapping a pointer-to-a-class type does not make sense. Let's not try to support this.

I'm actually amazed that all tests pass, except clang-tidy.

In theory we could suppress the clang-tidy warnings, but I'd rather not, unless we can build a convincing case that py::class_-wrapping a pointer-to-a-class type actually is actually good and useful.

But let's take a step back:

I added two commits:

I'll merge this PR when I see that all GitHub Actions jobs pass. That will resolve the regression as soon as possible.

rwgk commented 1 month ago

Continuing the discussion here for now, but in a separate comment:

py::class_-wrapping a pointer-to-a-class type does not make sense.

It does happen to work for passing around pointers to incomplete (wished-for "private") classes.

But I'd argue it's a hack. I wouldn't want to take away the candy, but I also wouldn't want to disable clang-tidy warnings to support a hack.

I'd say what we really want is a clean feature to support passing around pointers to incomplete (from the perspective of pybind11) types.

I think it's feasible, and probably very little extra code, but it needs some experimentation.

Which means:

rwgk commented 1 month ago

I edited the PR description, mainly to restore the template for the description (so that I'm more sure the automatic harvesting will work) and to add a minimal Suggested changelog entry. That way we won't forget. The changelog entries are automatically harvested and then manually curated. In this case we'll combine the information from PRs 5381 and 5396.

rwgk commented 1 month ago

See #5409, which (currently) sketches out a way to cleanly support passing around pointers to incomplete (from the perspective of pybind11) types.