Closed hczhai closed 5 months ago
@aimir could you help reviewing this PR?
Here we want
type(map_int_int.keys())
to beKeysView[int]
andtype(map_long_int.keys())
to beKeysView[long]
. But in after PR #4353 they are all assigned the same type nameKeysView[int]
which is caused by the usage ofdetail::make_caster<KeyType>::name
for determining the type name. It maps all C++ integer types (int16_t, int32_t, ...) to the same python type nameint
and C++ float/double types to the same namefloat
, etc.
I'm super sleep-deprived at the moment ... but anyway: it sounds like the right thing to do? Instead of less of it (this PR), could we alternatively do more of it?
Concretely, could we work on this condition instead?
// Wrap KeysView[KeyType] if it wasn't already wrapped
if (!detail::get_type_info(typeid(KeysView))) {
Please don't hold back letting me know if you think this line of thinking is completely off the tracks. I'll look again after catching some sleep.
Hi @rwgk thanks for your quick feedback.
Concretely, could we work on this condition instead?
This line seems to be not related to the string name for the python binding.
This particular line is also mentioned in issue #4529, and after mentioning this line the issue author also wrote "but in this case two different typeids have the same string name in pybind11". To make it clearer, the problem is only related to the "same string name" of the two different types, which is, a naming problem. The behavior that there are "two different typeids" is correct. So issue #4529 cannot be fixed by changing this if
condition.
Instead of less of it (this PR), could we alternatively do more of it?
Let me provide some more rationale for why I have to deleted the lines with detail::make_caster
for inferring the type name (which were introduced in PR #4353). The deleted lines and added lines are all about the proper type names for the KeysView
and ValuesView
containers. Since these are containers, the names are inferred from the type names of their elements, which are stored in string variables key_type_name
and mapped_type_name
.
Now let's say we keep the lines from PR #4353, which basically says key_type_name = detail::make_caster<KeyType>::name.text
. Note that detail::make_caster
is about "converting a C++ type into a Python type", so the "name" obtained this way is the type name of the Python object after the conversion. Now consider the special case when this is not working: Since both C++ int
and C++ long int
types will be cast into same Python type int
, we get the key_type_name = "int"
for both KeyType = int
and KeyType = long int
. Then later in https://github.com/pybind/pybind11/pull/4972/files#diff-f022f150c6954fd97430527c1f9baeebd154648534b15b7bd747a408f5f1b9cdR726-R727, two different "KeysView" will be given the same name KeysView[int]
, and we get the error
ImportError: generic_type: cannot initialize type "KeysView[int]": an object with that name is already defined
To solve this problem, we should name the container for int
as (something like) "KeysView[int]" and the container for long int
as "KeysView[long]", this is essentially the same idea as we should do something like
py::bind_vector<vector<int32_t>>(m, "vector_int");
py::bind_vector<vector<int64_t>>(m, "vector_long");
rather than
py::bind_vector<vector<int32_t>>(m, ("vector_" + detail::make_caster<int32_t>::name.text).c_str()); // the type name will be "vector_int"
py::bind_vector<vector<int64_t>>(m, ("vector_" + detail::make_caster<int64_t>::name.text).c_str()); // the type name will be "vector_int" (but should actually be "vector_long")
which will cause the ImportError: generic_type: cannot initialize type "vector_int": an object with that name is already defined
error.
In summary, for this naming purpose, the C++ to Python caster detail::make_caster
is irrelevant so I think it is good to not keep these lines to get the correct behavior. Please let me know whether this explanation is clear.
@aimir Please correct me if I misunderstood something.
What seems unfortunate is that we get a mix of Python type names and C++ type names.
See:
Specifically (from https://github.com/python/mypy/issues/16306#issuecomment-1815306375):
"m.MapIntUserType.keys.__doc__": "keys(self: pybind11_tests.cases_for_stubgen.MapIntUserType) -> pybind11_tests.cases_for_stubgen.KeysView[int]\n",
With this PR, we'd get e.g. KeysView[long]
, but the Python name long
does not exist.
How could we make this work nice also for the purposes of stubgen?
KeysView[Annotated[Any, pybind11.CppType("long")]]
?
Does a Python user even want to know that detail? I'm thinking KeysView[int]
for any C++ type that's converted to Python int
is better.
I think a better direction would be to class_
-wrap only these once each:
Then add virtual std::string python_name() = 0;
(key & value) or virtual std::pair<std::string, std::string> python_names() = 0;
(items).
Using make_caster
to get the Python names.
That's a bigger project, unfortunately. And probably I'm overlooking a thing or two.
... what do you think?
Thanks a lot @aimir for the feedback!
Looks good to me then, as far as I understand, what we'll have is still better than the state before #4353.
To deal with the mix of Python type names and C++ type names, this super simple idea crossed my mind:
https://github.com/python/mypy/issues/16306#issuecomment-1859004910
(I.e. we can take care of that separately.)
@hczhai I'll approve this PR after you add the keys()
and items()
tests, as suggested by @aimir.
@rwgk @aimir Thanks a lot for the feedback!
@aimir The reason I deleted the tests for keys()
and items()
:
In the example used for testing, we have MapStringDouble
so the key type is the std::string
. But the C++ type name (obtained from detail::type_info_description(typeid(KeyType))
) for std::string
is complicated. We will have to do (depending on the compiler)
assert (map_string_double.keys().__class__.__name__ in [
"KeysView[std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >]",
"KeysView[std::basic_string<char,std::char_traits<char>,std::allocator<char> >]",
"KeysView[std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >]"
]
instead of assert map_string_double.keys().__class__.__name__ == "KeysView[str]"
. Is this okay? Maybe use another map
without str
keys for testing?
@rwgk So I am thinking of the alternative way that you suggested - namely, how to make it work with the python names. So if we class_
-wrap keys_view<KeyType>
once with the name KeysView[int]
for all different int16_t, int32_t, ...
, how to determine the template parameter KeyType
here? I am thinking KeyType = detail::make_caster<CppKeyType>::py_type
but py_type
is only available for int
and float
(not available for tuple[int, int]
, etc.)
@rwgk So I am thinking of the alternative way that you suggested - namely, how to make it work with the python names. So if we
class_
-wrapkeys_view<KeyType>
once with the nameKeysView[int]
for all differentint16_t, int32_t, ...
, how to determine the template parameterKeyType
here? I am thinkingKeyType = detail::make_caster<CppKeyType>::py_type
butpy_type
is only available forint
andfloat
(not available fortuple[int, int]
, etc.)
I'm really not sure, but offering what comes to mind:
In my understanding, detail::make_caster<CppKeyType>::name
is meant to be a Python name.
Now, that doesn't always hold true, but with the backtick trick I'm experimenting with under #4888, at least we're clearly exposing when we're falling back to a C++ type name. That's "good enough" IMO, hoping that stubgen will be changed in the future to process the backtick notation in a meaningful way.
Does that help?
@sizmailov for awareness.
After some experiments, I did not find any simple way for making KeysView
, etc. pythonic... It seems to be some much larger amount of work. So maybe we have to use the ugly C++ type names for now.
@aimir @rwgk I have fixed the problems in the tests. Please see if they look good.
But the C++ type name (obtained from
detail::type_info_description(typeid(KeyType))
) forstd::string
is complicated.
The goal of #4353 was to fix #3986. What we're getting with this PR doesn't solve the original problem. — Is that a fair assessment?
In any case, I initiated Google global testing with this PR, to see how many stubgen tests are failing, if any. I'll report back when I have the results.
@rwgk Thanks for the comments.
The goal of PR #4353 was to fix issue #3986. What we're getting with this PR doesn't solve the original problem. — Is that a fair assessment?
In my opinion, #3986 is still about "the right type signature" rather than a real bug. #4529 is a severe bug, generating runtime error messages and breaking many existing applications.
Nevertheless, based on our previous discussions, I found a working way to generate correct Python type names for KeysView, etc. See PR #4983, It is a slightly more complicated way, but should solve both #4529 and #3986 and is compatible with your #4888. I am happy to work on either this PR or PR #4983. But I do hope that we can fix the bug in #4529 soon.
To report the results of the Google global testing with this PR:
There was only one failing test, comparing existing stubgen results with new results. The reduced diff is below. (For my own easy reference, the test ID was OCL:591943630:BASE:592407555:1703042759080:eb7a4a92.)
I'm not sure how many similar stubgen tests we have, my best guess is on the order of 100. — To put this in perspective, the total number of tests is many millions.
This regression is not great of course, but just from a practical viewpoint, it would be defendable.
-class ItemsView[int, object]:
+class ItemsView[long, object]:
-class ItemsView[str, object]:
+class ItemsView[std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>>, object]:
-class KeysView[int]:
+class KeysView[long]:
-class KeysView[str]:
+class KeysView[std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>>]:
- def items(self) -> ItemsView[int,object]: ...
- def keys(self) -> KeysView[int]: ...
+ def items(self) -> ItemsView[long,object]: ...
+ def keys(self) -> KeysView[long]: ...
- def items(self) -> ItemsView[str,object]: ...
- def keys(self) -> KeysView[str]: ...
+ def items(self, *args, **kwargs) -> Any: ...
+ def keys(self, *args, **kwargs) -> Any: ...
Closing this after #4985 was merged.
Description
This PR fixes issue #4529. The problem was introduced in PR #4353.
The problem is the following:
Here we want
type(map_int_int.keys())
to beKeysView[int]
andtype(map_long_int.keys())
to beKeysView[long]
. But in after PR #4353 they are all assigned the same type nameKeysView[int]
which is caused by the usage ofdetail::make_caster<KeyType>::name
for determining the type name. It maps all C++ integer types (int16_t, int32_t, ...) to the same python type nameint
and C++ float/double types to the same namefloat
, etc.The solution is to always use the C++ type name in
KeyView[...]
andValuesView[...]
.Note that before PR #4353, the type names were
KeysView[map_int_int]
andKeysView[map_long_int]
for the above example (which is okay).Suggested changelog entry: