indygreg / python-build-standalone

Produce redistributable builds of Python
BSD 3-Clause "New" or "Revised" License
1.71k stars 107 forks source link

libedit multibyte character unicode Input #169

Closed mitsuhiko closed 1 year ago

mitsuhiko commented 1 year ago

libedit as compiled into these interpreters hast at least on macOS issues with multibyte characters. If you take the Unicode Snowman and you paste it into the interpreter, you end up with this:

Python 3.10.9 (main, Dec 20 2022, 19:01:09) [Clang 14.0.3 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = "â☃"
>>> x
'â☃'

Notice the extra added â which appears out of thin air after copy paste. The same happens if you copy paste a word with Umlauts: "Bälle" turns into "BÃälle".

I'm not sure if this a bug against cpython/libedit or here but I was told recently that recent libedit versions should no longer have this issue.

mitsuhiko commented 1 year ago

Turns out this is still unresolved in libedit as used by CPython. Closing this here.

indygreg commented 1 year ago

If you have a link to an upstream issue, feel free to drop it in this issue. I'm curious about why this doesn't work.

We currently link against the libedit provided by macOS. We might be able to build our own libedit if the macOS one doesn't have proper multibyte support.