pybind / pybind11

Seamless operability between C++11 and Python
https://pybind11.readthedocs.io/
Other
15.11k stars 2.05k forks source link

Experimenting: `Annotated[Any, pybind11.CppType("cpp_namespace::UserType")]` #4888

Open rwgk opened 8 months ago

rwgk commented 8 months ago

Description

For context see https://github.com/pybind/pybind11/pull/4876#issuecomment-1768750707 and surrounding comments.

Note: PR #5073 was split out from this PR and released with pybind11 v2.12.0

TODO:

Suggested changelog entry:

rwgk commented 8 months ago

@sizmailov The idea here is to not knowingly generate docstrings that stubgen cannot possibly process correctly.

stubgen 1.5.1 does not process the Annotated[Any, "cpp_namespace::UserType"] as desired.

What do you think, is this a useful direction? Could stubgen be updated to process Annotated as desired?

rwgk commented 8 months ago

Some new information:

I ran global testing with this PR. Through that I discovered 3 .pyi files that look better, for example (manipulated because it's closed source):

-    def foo(self, *args, **kwargs) -> Any: ...
+    def foo(self, value) -> None: ...

That makes me think stubgen already has some functionality for handling Annotated[Any, "cpp_namespace::UserType"], but it's not applied recursively.

sizmailov commented 8 months ago

What do you think, is this a useful direction? Could stubgen be updated to process Annotated as desired?

What would be the desired output? The following should be easily achievable:

def values(self) -> Iterator[Annotated[Any, "cpp_namespace::UserType"]]: ...

As a user of a python extension I would prefer to have this:

def values(self) -> Iterator[py_module.UserType]: ...

It's possible but requires information on mapping C++ types to Python, which is unavailable for stubgen. The only place that has this mapping is the binding code (e.g. pybind11_protobuf). Not sure how this can be extracted and passed to stubgen...

If I would go with Annotated I'd put some "label" around type string to leave no room for mistake (for 3rd-party tools that make use of annotations), e.g.:

def values(self) -> Iterator[Annotated[Any, Pybind11FromCppType("cpp_namespace::UserType")]]: ...
rwgk commented 8 months ago

def values(self) -> Iterator[Annotated[Any, Pybind11FromCppType("cpp_namespace::UserType")]]: ...

I'll try that, thanks!

What I'm aiming for here is a last-ditch effort to produce a docstring that does not trip up stubgen. The current behavior is to simply let it fall on the floor.

As a user of a python extension I would prefer to have this: def values(self) -> Iterator[py_module.UserType]: ...

That only works for py::class_- or py::enum_-wrapped types. The implementation is here (right above the code this PR is changing):

There is no existing mechanism for replacing the % placeholder (originating from here) at runtime with a fully-qualified Python name. Ideas that come to mind: maybe another registry; maybe a magic classmethod. But even if implemented, that can fail, too, so a robust fallback like what this PR is about will always have a value.

rwgk commented 8 months ago

I decided to go with

def values(self) -> Iterator[Annotated[Any, CppTypePybind11("cpp_namespace::UserType")]]: ...

For the 5 .pyi files that I hit on via global testing before, inserting CppTypePybind11() does not make a difference. I like the extra clarity though, therefore I initiated another global testing run with this updated PR.

rwgk commented 8 months ago

Another data point, result from globally testing this PR @ commit 272152e68b062749e892dbb362ced079064fd666, which inserts CppTypePybind11() like this:

Annotated[Any, CppTypePybind11("cpp_namespace::UserType")]

There is only 1 new test failure, in tensorflow, a comparison with a golden file. That should be easy to deal with.

rwgk commented 8 months ago

@sizmailov do have connections to the mypy stubgen owners?

I feel it would be ideal to get their input before we make a move here. Concrete questions:

It seems like mypy does handle outer-level Annotated, but not nested ones, e.g.

Iterator[str, Annotated[Any, CppTypePybind11("cpp_namespace::UserType")]]
rwgk commented 8 months ago

@sizmailov do have connections to the mypy stubgen owners?

I went ahead and created https://github.com/python/mypy/issues/16306

sizmailov commented 8 months ago

type, exception, memoryview and bytearray also miss the handle_type_name.

Not sure about dtype, module_, generic_type and exception, since they are not in pytypes.h.

The rest seems to be properly handled.

grep -Poh 'class \w+ : public object ' include/pybind11/*.h ``` class dtype : public object class module_ : public object class generic_type : public object class exception : public object class iterator : public object class type : public object class iterable : public object class str : public object class bytes : public object class bytearray : public object class none : public object class ellipsis : public object class bool_ : public object class int_ : public object class float_ : public object class weakref : public object class slice : public object class capsule : public object class tuple : public object class dict : public object class sequence : public object class list : public object class anyset : public object class function : public object class staticmethod : public object class buffer : public object class memoryview : public object ```
rwgk commented 8 months ago

type, ~exception,~ memoryview and bytearray also miss the handle_type_name.

Done: 3c209443552e0be509592f2bab6f6221353e2e5d

Not sure about dtype,

Done: 6a3a954a6b85d2f5dd770f7ab08d12bf1052a1b5

module_,

Done: 74817b7f9f384cd5fd15d5765d3c6594adac2e0a

since they are not in pytypes.h.

There is already a precedent for that, the existing handle_type_name<> specializations in numpy.h:

generic_type and

That's what the % placeholder was originally invented for (I guess), and is handled in the code I pointed out before:

exception,

My current conclusion for this: 169b0e57ebda14c47f159823ea57c1971f713121

That class is a real oddball. The first thing that stands out, the type template parameter is not used. It's not really a type at all, but more of a factory for producing types derived from Exception. There is no constructor that takes a py::handle and checks if the object is actually derived from Exception. Therefore it cannot be an argument of a wrapped function:

m.def("pass_exception_void", [](const py::exception<void>&) {}); // Does not compile.  
pytypes.h:459:36: error: could not convert ‘{h, pybind11::object::borrowed_t()}’ from ‘<brace-enclosed initializer list>’ to ‘pybind11::exception<void>’
  459 |     return {h, object::borrowed_t{}};

It can only appear as a return type, but without subversive steps only to return a newly created type, which fails at runtime. See added test.

I believe it's good that Annotated[...] appears in the error. I wouldn't want to suggest this is the right way to go, by working to produce something different.

rwgk commented 8 months ago

@sizmailov Unless you have more suggestions, I'll stop working on this now, waiting for feedback from the stubgen side.

I think this PR only makes sense if there are cooperating changes in stubgen.

rwgk commented 6 months ago

@sizmailov I'm interested in your opinion regarding d14d91e02afba4046c9a899ad47a20220ac1f1a5, could you please take a look?

The main idea is to remove the handle_type_name default implementation, to enforce that all types are explicitly specialized, to guard against oversights.

That generates the question: What do we actually want in the added specializations? — I'm not sure, any advice is appreciated.

rwgk commented 6 months ago

Logging the results of a simple experiment: Google global testing with the current version of this PR:

(Internal test ID: OCL:593010609:BASE:593028799:1703228284011:fdcd5c9a)

22 build failures (22 unique)

$ grep 'cast.h:1208:32' * | cut -d"'" -f2 | sort | uniq -c
      6 pybind11::detail::handle_type_name<jax::(anonymous namespace)::PjitFunction::pyobject>
      6 pybind11::detail::handle_type_name<jax::PmapFunction::pyobject>
      6 pybind11::detail::handle_type_name<pybind11::enum_<xla::OpSharding_ShardGroupType>>
      6 pybind11::detail::handle_type_name<pybind11::enum_<xla::OpSharding_Type>>
     45 pybind11::detail::handle_type_name<xla::PyArray>

It would probably take a few hours of effort to get those fixed (by adding handle_type_name specializations).

It might be too disruptive to release pybind11 without the handle_type_name default implementation, at least without a transition plan (e.g. first a release with opt-in, later a release changing to opt-out).