pybind / pybind11

Seamless operability between C++11 and Python
https://pybind11.readthedocs.io/
Other
15.1k stars 2.05k forks source link

bugfix: ensure different KeysView/ValuesView type names for int16_t/int32_t or float/double. Fix #4529 #4972

Closed hczhai closed 5 months ago

hczhai commented 6 months ago

Description

This PR fixes issue #4529. The problem was introduced in PR #4353.

The problem is the following:

// leads to ImportError: generic_type: cannot initialize type "KeysView[int]": an object with that name is already defined
py::bind_map<map<int, int>>(m, "map_int_int");
py::bind_map<map<long long int, int>>(m, "map_long_int");

Here we want type(map_int_int.keys()) to be KeysView[int] and type(map_long_int.keys()) to be KeysView[long]. But in after PR #4353 they are all assigned the same type name KeysView[int] which is caused by the usage of detail::make_caster<KeyType>::name for determining the type name. It maps all C++ integer types (int16_t, int32_t, ...) to the same python type name int and C++ float/double types to the same name float, etc.

The solution is to always use the C++ type name in KeyView[...] and ValuesView[...].

Note that before PR #4353, the type names were KeysView[map_int_int] and KeysView[map_long_int] for the above example (which is okay).

Suggested changelog entry:

Fix type name clash in ``KeysView`` and ``ValuesView`` when using multiple ``bind_map`` with different integer key types
rwgk commented 6 months ago

@aimir could you help reviewing this PR?

Here we want type(map_int_int.keys()) to be KeysView[int] and type(map_long_int.keys()) to be KeysView[long]. But in after PR #4353 they are all assigned the same type name KeysView[int] which is caused by the usage of detail::make_caster<KeyType>::name for determining the type name. It maps all C++ integer types (int16_t, int32_t, ...) to the same python type name int and C++ float/double types to the same name float, etc.

I'm super sleep-deprived at the moment ... but anyway: it sounds like the right thing to do? Instead of less of it (this PR), could we alternatively do more of it?

Concretely, could we work on this condition instead?

    // Wrap KeysView[KeyType] if it wasn't already wrapped
    if (!detail::get_type_info(typeid(KeysView))) {

Please don't hold back letting me know if you think this line of thinking is completely off the tracks. I'll look again after catching some sleep.

hczhai commented 6 months ago

Hi @rwgk thanks for your quick feedback.

Concretely, could we work on this condition instead?

This line seems to be not related to the string name for the python binding.

This particular line is also mentioned in issue #4529, and after mentioning this line the issue author also wrote "but in this case two different typeids have the same string name in pybind11". To make it clearer, the problem is only related to the "same string name" of the two different types, which is, a naming problem. The behavior that there are "two different typeids" is correct. So issue #4529 cannot be fixed by changing this if condition.

Instead of less of it (this PR), could we alternatively do more of it?

Let me provide some more rationale for why I have to deleted the lines with detail::make_caster for inferring the type name (which were introduced in PR #4353). The deleted lines and added lines are all about the proper type names for the KeysView and ValuesView containers. Since these are containers, the names are inferred from the type names of their elements, which are stored in string variables key_type_name and mapped_type_name.

Now let's say we keep the lines from PR #4353, which basically says key_type_name = detail::make_caster<KeyType>::name.text. Note that detail::make_caster is about "converting a C++ type into a Python type", so the "name" obtained this way is the type name of the Python object after the conversion. Now consider the special case when this is not working: Since both C++ int and C++ long int types will be cast into same Python type int, we get the key_type_name = "int" for both KeyType = int and KeyType = long int. Then later in https://github.com/pybind/pybind11/pull/4972/files#diff-f022f150c6954fd97430527c1f9baeebd154648534b15b7bd747a408f5f1b9cdR726-R727, two different "KeysView" will be given the same name KeysView[int], and we get the error

ImportError: generic_type: cannot initialize type "KeysView[int]": an object with that name is already defined

To solve this problem, we should name the container for int as (something like) "KeysView[int]" and the container for long int as "KeysView[long]", this is essentially the same idea as we should do something like

py::bind_vector<vector<int32_t>>(m, "vector_int");
py::bind_vector<vector<int64_t>>(m, "vector_long");

rather than

py::bind_vector<vector<int32_t>>(m, ("vector_" + detail::make_caster<int32_t>::name.text).c_str()); // the type name will be "vector_int"
py::bind_vector<vector<int64_t>>(m, ("vector_" + detail::make_caster<int64_t>::name.text).c_str()); // the type name will be "vector_int" (but should actually be "vector_long")

which will cause the ImportError: generic_type: cannot initialize type "vector_int": an object with that name is already defined error.

In summary, for this naming purpose, the C++ to Python caster detail::make_caster is irrelevant so I think it is good to not keep these lines to get the correct behavior. Please let me know whether this explanation is clear.

@aimir Please correct me if I misunderstood something.

rwgk commented 6 months ago

What seems unfortunate is that we get a mix of Python type names and C++ type names.

See:

Specifically (from https://github.com/python/mypy/issues/16306#issuecomment-1815306375):

    "m.MapIntUserType.keys.__doc__": "keys(self: pybind11_tests.cases_for_stubgen.MapIntUserType) -> pybind11_tests.cases_for_stubgen.KeysView[int]\n",

With this PR, we'd get e.g. KeysView[long], but the Python name long does not exist.

How could we make this work nice also for the purposes of stubgen?

KeysView[Annotated[Any, pybind11.CppType("long")]]?

Does a Python user even want to know that detail? I'm thinking KeysView[int] for any C++ type that's converted to Python int is better.

I think a better direction would be to class_-wrap only these once each:

Then add virtual std::string python_name() = 0; (key & value) or virtual std::pair<std::string, std::string> python_names() = 0; (items).

Using make_caster to get the Python names.

That's a bigger project, unfortunately. And probably I'm overlooking a thing or two.

... what do you think?

rwgk commented 6 months ago

Thanks a lot @aimir for the feedback!

Looks good to me then, as far as I understand, what we'll have is still better than the state before #4353.

To deal with the mix of Python type names and C++ type names, this super simple idea crossed my mind:

https://github.com/python/mypy/issues/16306#issuecomment-1859004910

(I.e. we can take care of that separately.)

@hczhai I'll approve this PR after you add the keys() and items() tests, as suggested by @aimir.

hczhai commented 6 months ago

@rwgk @aimir Thanks a lot for the feedback!

@aimir The reason I deleted the tests for keys() and items():

In the example used for testing, we have MapStringDouble so the key type is the std::string. But the C++ type name (obtained from detail::type_info_description(typeid(KeyType))) for std::string is complicated. We will have to do (depending on the compiler)

assert (map_string_double.keys().__class__.__name__ in [
    "KeysView[std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >]",
    "KeysView[std::basic_string<char,std::char_traits<char>,std::allocator<char> >]",
    "KeysView[std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >]"
]

instead of assert map_string_double.keys().__class__.__name__ == "KeysView[str]". Is this okay? Maybe use another map without str keys for testing?

@rwgk So I am thinking of the alternative way that you suggested - namely, how to make it work with the python names. So if we class_-wrap keys_view<KeyType> once with the name KeysView[int] for all different int16_t, int32_t, ..., how to determine the template parameter KeyType here? I am thinking KeyType = detail::make_caster<CppKeyType>::py_type but py_type is only available for int and float (not available for tuple[int, int], etc.)

rwgk commented 6 months ago

@rwgk So I am thinking of the alternative way that you suggested - namely, how to make it work with the python names. So if we class_-wrap keys_view<KeyType> once with the name KeysView[int] for all different int16_t, int32_t, ..., how to determine the template parameter KeyType here? I am thinking KeyType = detail::make_caster<CppKeyType>::py_type but py_type is only available for int and float (not available for tuple[int, int], etc.)

I'm really not sure, but offering what comes to mind:

In my understanding, detail::make_caster<CppKeyType>::name is meant to be a Python name.

Now, that doesn't always hold true, but with the backtick trick I'm experimenting with under #4888, at least we're clearly exposing when we're falling back to a C++ type name. That's "good enough" IMO, hoping that stubgen will be changed in the future to process the backtick notation in a meaningful way.

Does that help?

@sizmailov for awareness.

hczhai commented 6 months ago

After some experiments, I did not find any simple way for making KeysView, etc. pythonic... It seems to be some much larger amount of work. So maybe we have to use the ugly C++ type names for now.

@aimir @rwgk I have fixed the problems in the tests. Please see if they look good.

rwgk commented 6 months ago

But the C++ type name (obtained from detail::type_info_description(typeid(KeyType))) for std::string is complicated.

The goal of #4353 was to fix #3986. What we're getting with this PR doesn't solve the original problem. — Is that a fair assessment?

In any case, I initiated Google global testing with this PR, to see how many stubgen tests are failing, if any. I'll report back when I have the results.

hczhai commented 6 months ago

@rwgk Thanks for the comments.

The goal of PR #4353 was to fix issue #3986. What we're getting with this PR doesn't solve the original problem. — Is that a fair assessment?

In my opinion, #3986 is still about "the right type signature" rather than a real bug. #4529 is a severe bug, generating runtime error messages and breaking many existing applications.

Nevertheless, based on our previous discussions, I found a working way to generate correct Python type names for KeysView, etc. See PR #4983, It is a slightly more complicated way, but should solve both #4529 and #3986 and is compatible with your #4888. I am happy to work on either this PR or PR #4983. But I do hope that we can fix the bug in #4529 soon.

rwgk commented 6 months ago

To report the results of the Google global testing with this PR:

There was only one failing test, comparing existing stubgen results with new results. The reduced diff is below. (For my own easy reference, the test ID was OCL:591943630:BASE:592407555:1703042759080:eb7a4a92.)

I'm not sure how many similar stubgen tests we have, my best guess is on the order of 100. — To put this in perspective, the total number of tests is many millions.

This regression is not great of course, but just from a practical viewpoint, it would be defendable.

-class ItemsView[int, object]:
+class ItemsView[long, object]:

-class ItemsView[str, object]:
+class ItemsView[std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>>, object]:

-class KeysView[int]:
+class KeysView[long]:

-class KeysView[str]:
+class KeysView[std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>>]:

-    def items(self) -> ItemsView[int,object]: ...
-    def keys(self) -> KeysView[int]: ...
+    def items(self) -> ItemsView[long,object]: ...
+    def keys(self) -> KeysView[long]: ...

-    def items(self) -> ItemsView[str,object]: ...
-    def keys(self) -> KeysView[str]: ...
+    def items(self, *args, **kwargs) -> Any: ...
+    def keys(self, *args, **kwargs) -> Any: ...
rwgk commented 5 months ago

Closing this after #4985 was merged.