ztane / python-Levenshtein

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity
GNU General Public License v2.0
1.26k stars 155 forks source link

Py_UNICODE is deprecated #54

Open methane opened 4 years ago

methane commented 4 years ago

Py_UNICODE is deprecated since Python 3.3, and we are planning to remove them in Python 3.11. Py_UNICODE is deprecated since Python 3.3 and will be removed in Python 3.11. Would you replace Py_UNICODE with wchar_t, and PyUnicode_FromUnicode with PyUnicode_FromWideChar?

./python-Levenshtein-0.12.0/Levenshtein/_levenshtein.c:1001:      result = PyUnicode_FromUnicode(medstr, len);
./python-Levenshtein-0.12.0/Levenshtein/_levenshtein.c:1088:      result = PyUnicode_FromUnicode(medstr, len);
./python-Levenshtein-0.12.0/Levenshtein/_levenshtein.c:1930:      result = PyUnicode_FromUnicode(s, len);
./python-Levenshtein-0.12.0/Levenshtein/_levenshtein.c:1946:      result = PyUnicode_FromUnicode(s, len);
maxbachmann commented 2 years ago

Is PyUnicode_FromUnicode actually removed in Python 3.11? The Python docs lists the removal for Python 3.12.

methane commented 2 years ago

The removal is postponed to Python 3.12. But PyUnicode_FromUnicode() emits runtime warning. So it is very inefficient already.

maxbachmann commented 2 years ago

Good to know. I will replace it in my fork in the next release.

maxbachmann commented 2 years ago

@methane does PyUnicode_AS_UNICODE emit a warning as well? This API is significantly harder to replace, since there is no 1:1 replacement (either needs to handle 1/2/4 Byte sizes or allocate + deallocate).

methane commented 2 years ago

Use PyUnicode_AsUCS4Copy() and PyMem_Free(). PyUnicode_AS_UNICODE() uses UTF-16 on Windows. I think it is bad for levenshtein library.