wjakob / nanobind

nanobind: tiny and efficient C++/Python bindings
BSD 3-Clause "New" or "Revised" License
2.29k stars 191 forks source link

[BUG]: kwargs dispatch fails when receiving a non-interned version of a keyword #468

Closed tjstum closed 6 months ago

tjstum commented 6 months ago

Problem description

In CPython, strings can be interned (and this happens automatically based on functions' keyword names, it appears), but that doesn't mean that you can't get an equivalent but distinct string:

>>> str("foo") is str("foo")
True
>>> "".join("foo") is str("foo")
False

The overload dispatch loop relies on the address (Python's id, what the is operator uses) being equivalent for the keyword argument used in the binding's .def call and that actually passed in as the keyword argument.

In the example code, you can see the issue that this causes. By changing the nb_func line I highlighted above to PyUnicode_Compare even on CPython, the example code stops reproducing. Therefore, a "simple" fix would be to use PyUnicode_Compare unconditionally.

While this particular example may seem contrived, it actually comes up in our application when the dictionary is received via unpickling (you can also get a different string object from pickle.loads(pickle.dumps(<some string>)).

Reproducible example code

NB_MODULE(_nanobind_meta, m) {
    m.def("kwargs_test", [](int foo, int bar){return foo + bar;}, "foo"_a, "bar"_a);
}
---
>>> import _nanobind_meta
>>> _nanobind_meta.kwargs_test(foo=2, bar=3)
5
>>> kw = {"foo": 2, "bar": 3}
>>> _nanobind_meta.kwargs_test(**kw)
5
>>> reconstructed = {"".join(k): v for k, v in kw.items()}
>>> reconstructed == kw
True
>>> _nanobind_meta.kwargs_test(**reconstructed)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: kwargs_test(): incompatible function arguments. The following argument types are supported:
    1. kwargs_test(foo: int, bar: int) -> int

Invoked with types: kwargs = { foo: int, bar: int }
wjakob commented 6 months ago

It's easy to fix this, but it will come with a cost. Keyword argument dispatch will become slower for everyone. I am not sure if the cost outweighs the benefits.

tjstum commented 6 months ago

This would probably make the code pretty unwieldy, but could keyword argument dispatch first try the address-based check, and then, before giving up on the overload (or, I guess after trying all overloads but before giving up completely), try the slower, equality-based approach?

If that's too much, do you have any other suggested workaround?

oremanj commented 6 months ago

CPython itself (initialize_locals in ceval.c) does the id comparison first, and falls back to contents comparison if there's no match for a given kwarg. That would limit the penalty to functions that take var-keywords and also have some named keywords that aren't provided, which I'm guessing would be more palatable? I can work on a patch.