python / cpython

The Python programming language
https://www.python.org
Other
63.44k stars 30.38k forks source link

[C API] Add PyUnicode_Equal() function #124502

Closed vstinner closed 1 month ago

vstinner commented 1 month ago

Python 3.13 moved the private _PyUnicode_EQ() function to internal C API. mypy and Pyodide are using it.

I propose to add a public PyUnicode_Equal(a, b) function to the limited C API 3.14 to replace the private _PyUnicode_EQ() function:

Linked PRs

vstinner commented 1 month ago

cc @serhiy-storchaka @encukou

rruuaanng commented 1 month ago

I think PyUnicode_Equal should like strcmp in glibc. Return 0 if a is equal to b. Return 1 if a isn't equal to b.

vstinner commented 1 month ago

I think PyUnicode_Equal should like strcmp in glibc. Return 0 if a is equal to b. Return 1 if a isn't equal to b.

The C API Guidelines suggest to return 1 if "found" and 0 if "not found": https://devguide.python.org/developer-workflow/c-api/#guidelines-for-expanding-changing-the-public-api

IMO it's more natural to return 1 if equal and 0 if not equal. So you can write if (PyUnicode_Equal(a, b)) { /* a == b */ }.

Moreover, PyUnicode_Equal() API the same as existing private functions _PyUnicode_EQ() and _PyUnicode_Equal() (except that these functions don't check if arguments are strings).

rruuaanng commented 1 month ago

I think PyUnicode_Equal should like strcmp in glibc.

Return 0 if a is equal to b.

Return 1 if a isn't equal to b.

The C API Guidelines suggest to return 1 if "found" and 0 if "not found": https://devguide.python.org/developer-workflow/c-api/#guidelines-for-expanding-changing-the-public-api

IMO it's more natural to return 1 if equal and 0 if not equal. So you can write if (PyUnicode_Equal(a, b)) { /* a == b */ }.

Moreover, PyUnicode_Equal() API the same as existing private functions _PyUnicode_EQ() and _PyUnicode_Equal() (except that these functions don't check if arguments are strings).

I think that's good idea, So when will it be implemented? Maybe you could give me a shot. Haha

vstinner commented 1 month ago

I already implemented the function. See attached PR.

ZeroIntensity commented 1 month ago

Hmm, will this result in a cascade of PySomething_Equal functions? This case should be special, I don't think we need PyLong_Equal, PyDict_Equal, PyList_Equal, etc.

Out of curiosity, what's the benefit here over PyUnicode_Compare(...) == 0?

picnixz commented 1 month ago

PyUnicode_Compare is slower since it needs to handle Unicode kinds separately (you need to take care of mixed kinds since comparing two Unicode strings of different kinds is a well-defined operation). PyUnicode_Equal is faster because 1) it directly uses memcmp and 2) only needs to do something when the Unicode kinds match.

vstinner commented 1 month ago

Hmm, will this result in a cascade of PySomething_Equal functions? This case should be special, I don't think we need PyLong_Equal, PyDict_Equal, PyList_Equal, etc.

C extensions such as mypy and Pyodide are already using the private _PyUnicode_EQ() function which has been removed in Python 3.13. I'm proposing a public replacement for them.

I'm not aware of other PyTYPE_Equal() function. Only PyUnicode has a specialized equal function.

Out of curiosity, what's the benefit here over PyUnicode_Compare(...) == 0?

PyUnicode_Compare() returns -1 for "less than" but also for the error case. The caller must call PyErr_Occurred() which is inefficient. It causes an ambiguous return value: https://github.com/capi-workgroup/problems/issues/1

PyUnicode_Equal() has no such ambiguous return value (-1 only means error). Moreover, it may be a little bit faster, but I'm not sure about that.

ZeroIntensity commented 1 month ago

I'm not aware of other PyTYPE_Equal() function. Only PyUnicode has a specialize equal function.

Yeah, that's what I mean. I think it would be a bad idea to use this as precedent for future "faster" PyXXX_Equal functions. I'm fine with PyUnicode_Equal.

vstinner commented 1 month ago

I created https://github.com/capi-workgroup/decisions/issues/43 issue in the C API Working Group.

vstinner commented 1 month ago

Function added by change https://github.com/python/cpython/commit/a7f0727ca575fef4d8891b5ebfe71ef2a774868b.

vstinner commented 1 month ago

Function added to pythoncapi-compat with change: https://github.com/python/pythoncapi-compat/commit/abc0f29fb9b245efa3d12ba1e6b35c104f1784f1.