python / cpython

The Python programming language
https://www.python.org
Other
63.49k stars 30.41k forks source link

Tkinter hangs or crashes when displaying astral chars #86391

Closed terryjreedy closed 3 years ago

terryjreedy commented 4 years ago
BPO 42225
Nosy @terryjreedy, @ronaldoussoren, @ned-deily, @ezio-melotti, @serhiy-storchaka, @miss-islington, @E-Paine
PRs
  • python/cpython#25078
  • python/cpython#25105
  • python/cpython#25106
  • Files
  • fedora32.png
  • Ubuntu-2020.04.png
  • emojis.png
  • Screenshots_128547-128593.pdf: Character output to Ubuntu and Windows
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-bug', 'expert-tkinter', '3.9', '3.10', '3.8', 'expert-unicode'] title = 'Tkinter hangs or crashes when displaying astral chars' updated_at = user = 'https://github.com/terryjreedy' ``` bugs.python.org fields: ```python activity = actor = 'terry.reedy' assignee = 'none' closed = True closed_date = closer = 'terry.reedy' components = ['Tkinter', 'Unicode'] creation = creator = 'terry.reedy' dependencies = [] files = ['49556', '49557', '49567', '49581'] hgrepos = [] issue_num = 42225 keywords = ['patch'] message_count = 37.0 messages = ['380112', '380119', '380137', '380138', '380139', '380140', '380143', '380144', '380146', '380149', '380151', '380173', '380211', '380227', '380260', '380266', '380282', '380283', '380288', '380305', '380393', '380549', '380550', '380551', '380552', '380565', '380573', '380574', '380575', '380716', '389665', '389667', '389677', '389871', '389872', '389875', '389876'] nosy_count = 9.0 nosy_names = ['terry.reedy', 'ronaldoussoren', 'wordtech', 'ned.deily', 'ezio.melotti', 'serhiy.storchaka', 'miss-islington', 'epaine', 'IanSt1'] pr_nums = ['25078', '25105', '25106'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue42225' versions = ['Python 3.8', 'Python 3.9', 'Python 3.10'] ```

    terryjreedy commented 4 years ago

    On my macOS Mohave, 3.10, echoing '\U0001####' (# = hex digit) or chr(#####) (decimal digits) in IDLE's shell either prints an error box or hangs. On bpo-13153, freezing on macOS was reported for 3.7.6. Until tkinter on Mac works better, we should try to get an error box for all astral chars.

    For an SO questioner with Ubuntu 18.04, now updated to 20.04 with python 3.8.6, some chars display (128512-128547; 128549-128555; 128557-128576, example chr(128516)) and some 'crash' (example chr(128077)).  I am trying to get 'crash' narrowed down and the tk version Ubuntu uses.
    Serhiy, does >>> chr(128516) echo thumbs up on your Linux?

    The SO crash example works for me on Windows. I should test more codepoints.

    serhiy-storchaka commented 4 years ago

    I get a crash for chr(128516) ("😄") in Tk.

    $ wish
    % label .l -text 😄
    .l
    % X Error of failed request:  BadLength (poly request too large or internal Xlib length error)
      Major opcode of failed request:  139 (RENDER)
      Minor opcode of failed request:  20 (RenderAddGlyphs)
      Serial number of failed request:  599
      Current serial number in output stream:  599
    vstinner commented 4 years ago

    Serhiy:

    I get a crash for chr(128516) ("😄") in Tk.

    On Linux? What is your Tk version?

    On my Fedora 32, the character is displayed properly. It seems like Tk is still using X11 whereas my GNOME desktop is using Wayland.

    $ ./python -m test.pythoninfo|grep ^tkinter
    tkinter.TCL_VERSION: 8.6
    tkinter.TK_VERSION: 8.6
    tkinter.info_patchlevel: 8.6.10
    vstinner commented 4 years ago

    Hum, I didn't explain well. My test. I ran:

    ./python -m idlelib

    In the IDLE shell, I wrote chr(0x1F604) which displays the emoji as expected:

    >>> chr(0x1F604)
    '😄'
    serhiy-storchaka commented 4 years ago

    I generated a script for testing all characters:

    with open('withtest.sh', 'w', errors='surrogatepass') as f:
        for i in range(0x100, 0x110000): print(f"echo 'label .l -text \"{chr(i)}\"; exit' | wish 2>/dev/null && echo OK '\\U{i:08x}' {chr(i)!r} || echo FAIL '\\U{i:08x}' {chr(i)!r}", file=f)

    It takes a time. It tested around 20% of all characters for 6-7 hours. And it seems that all failed characters are colored emojies and all passed characters are non-colored. Seems it is related either to the font that provides colored emojies, or to the mechanism that interprets such fonts, or Tk just cannot correctly handle the output when such fonts are used (maybe reserve too small buffer or cannot interpret result code).

    vstinner commented 4 years ago

    Serhiy's test also work as expected.

    $ wish
    % label .l -text 😄

    Since the Serhiy's test doesn't use Python, is it worth it to track this Tk crash in the Python bug tracker?

    serhiy-storchaka commented 4 years ago

    Yes, on Linux. Ubuntu 2020.04. Tk 8.6.10. X.Org X Server 1.20.8.

    I tried to report the bug upstream, but failed. I did not use the Tk bugtracker several years, and it was on different computer, so I have no password to my account, and when I tried to create new accounts, I cannot login with them too. I tried to write to the mailing list, but it requires subscribing, and when I subscribed I did not receive a message with confirmation. If anybody can, please report this bug to Tk developers.

    serhiy-storchaka commented 4 years ago

    Victor, do you see a color smiling face in my example or monochromatic or just a bar?

    vstinner commented 4 years ago

    Victor, do you see a color smiling face in my example or monochromatic or just a bar?

    See attached screenshot: fedora32.png.

    serhiy-storchaka commented 4 years ago

    It looks different on my computer. I suppose it will crash to you too if you install a color emoji font.

    ronaldoussoren commented 4 years ago

    The error on Linux could be related to this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1498269

    terryjreedy commented 4 years ago

    In IDLE on Windows the following prints the first 3 astral planes in a couple of minutes.

    for i in range(0x10000, 0x40000, 32):
        chars = ''.join(chr(i+j) for j in range(32))
        print(hex(i), chars)

    Perhaps half of the assigned chars in the first plane are printed instead of being replaced with a narrow box. This includes emoticons as foreground color outlines on background color. Maybe all of the second plane of extended CJK chars are printed. The third plane is unassigned and prints as unassigned boxes (with an X).

    Fixing OS graphics or tk is out of scope for us. Preventing hangs or crashes when using tkinter is. On Mac, refusing to insert any astral char into a tk widget might be the best solution. Serhiy, could that be done in tkinter/_tkinter?

    On Linux, the situation appears to be more complex. The SO questioner https://stackoverflow.com/questions/64615570/why-do-some-emoticons-cause-python-idle-to-crash-on-ubuntu could print the two multicolor 'grinning face with smiling eyes' 😄, which fails for Serhiy, but not the simpler thumbsup 👍. I don't know if we can detect fonts that cause crashes.

    vstinner commented 4 years ago

    Fixing OS graphics or tk is out of scope for us. Preventing hangs or crashes when using tkinter is. On Mac, refusing to insert any astral char into a tk widget might be the best solution. Serhiy, could that be done in tkinter/_tkinter?

    I dislike attempting to workaround Tk issues in Python. As you can see, the behavior really depends on the platform. As I wrote, on Fedora 32 it works (the character is rendered properly). I would prefer to not block such character on Fedora 32 because it does crash on some other platforms.

    Or you should detect the very precise conditions explaining why it works on some platforms and crash on some other platforms...

    919475b3-36a2-4cd7-997c-9c38f05f93c7 commented 4 years ago

    For me, this is not limited to special characters. Trying to load anything in Tk using the 'JoyPixels' font crashes (sometimes it does load but all characters are very random - most are whitespace - and it crashes again after a call to fc-cache). IDLE crashes when trying to preview the font.

    I believe this is what is being experienced on https://askubuntu.com/questions/1236488/x-error-of-failed-request-badlength-poly-request-too-large-or-internal-xlib-le because they are not using any special characters yet are reporting the same problem.

    terryjreedy commented 4 years ago

    Victor, does my test run to completion (without exception) on your Fedora? If it does, I definitely would not disable astral char display on Fedora. This version catches exceptions and reports them separately and runs directly with tkinter, in about a second.

    tk = True
    if tk:
        from tkinter import Tk
        from tkinter.scrolledtext import ScrolledText
        root = Tk()
        text = ScrolledText(root, width=80, height=40)
        text.pack()
        def print(txt):
            text.insert('insert', txt+'\n')
    
    errors = []
    for i in range(0x10000, 0x40000, 32):
        chars = ''.join(chr(i+j) for j in range(32))
        try:
           print(f"{hex(i)} {chars}")
        except Exception as e:
            errors.append(f"{hex(i)} {e}")
    print("ERRORS:")
    for line in errors:
        print(line)
    serhiy-storchaka commented 4 years ago

    It works on Ubuntu if uninstall the color Emoji font (package fonts-noto-color-emoji).

    vstinner commented 4 years ago

    The following program fails with: --- X Error of failed request: BadLength (poly request too large or internal Xlib length error) Major opcode of failed request: 138 (RENDER) Minor opcode of failed request: 20 (RenderAddGlyphs) Serial number of failed request: 4248 Current serial number in output stream: 4956 ---

    Python program: ---

    from tkinter import Tk
    from tkinter.scrolledtext import ScrolledText
    root = Tk()
    text = ScrolledText(root, width=80, height=40)
    text.pack()
    
    for i in range(0x10000, 0x40000, 32):
        chars = ''.join(chr(i+j) for j in range(32))
        text.insert('insert', f"{hex(i)} {chars}\n")
    
    input("Press enter to exit")

    It seems like the first character which triggers this RenderAddGlyphs BadLength issue is: U+1f6c2. See attached emoji.png screenshot. As you can see, some emojis are rendered in color in Gnome Terminal. I guess that it uses the Gtk 3 pango library to render these characters.

    vstinner commented 4 years ago

    This version catches exceptions and reports them separately and runs directly with tkinter, in about a second.

    The X Error is displayed and then the process exit. Python cannot catch this fatal X Error.

    ronaldoussoren commented 4 years ago

    @Kevin Walzer: Is the problem were seeing a known issue with Tk?

    e9681926-2aff-4240-a89b-e9210c2ce124 commented 4 years ago

    Some work has been done this year on expanding support for these types of glyphs in Tk, but I'm not sure of its current state--it's not my area of expertise. Can you open a ticket at https://core.tcl-lang.org/tk/ so one of the folks working on this can take a look?

    terryjreedy commented 4 years ago

    Kevin, Serhiy tried to report this upstream but failed. msg380143. Perhaps you could.

    One person running my test program reported """ Fedora 32 x86-64 Cinnamon 4.6.7 Linux 5.8.16-200.fc32.x86_64 Python 3.8.6 (default, Sep 25 2020, 00:00:00) [GCC 10.2.1 20200723 (Red Hat 10.2.1-1)] on linux

    Running line-by-line in terminal, the for-loop crashes with: \<\<\< X Error of failed request: BadLength (poly request too large or internal Xlib length error) Major opcode of failed request: 138 (RENDER) Minor opcode of failed request: 20 (RenderAddGlyphs) Serial number of failed request: 3925 Current serial number in output stream: 4865 """

    Another reported "Seems to produce garbage on my system: [ads@ADS4 x]$ uname -a Linux ADS4 5.8.17-100.fc31.x86_64 #1 SMP Thu Oct 29 18:58:48 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux"

    But the program ran to completion without errors. A copy of the output from the window was attached. I have asked for the tcl/tk version. My response included: """ On *nix, Python (unicode) chars are utf-8 encoded by _tkinter for tk. The encoding of astral non-BMP chars uses 4 bytes. Perhaps tk on your ADS Linux (new to me) displays the 4 bytes as 4 chars instead of 1. For each block of 32, the first 3 are the same. This is true in this file, but easily seeing this depends on the display software.

    I don't know what you saw, but Notepad++ displays control chars with the high bit set (C1 controls) as their reversed type (white on black) 3 char acronym as defined on
    https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block) Character table.

    Thus the first astral U+10000 is encoded as b"\xF0\x90\x80\x80. In Notepad++, what is in the file appears as 4 characters, not 1, displayed 'ðDCSPADPAD', with the part after ð being being the correct white on black triplets for code points U+90 and U+80. The first char '\xf0' == 'ð' is the same for all quadruples shown by Notepad++. The next 3 vary as appropriate. In some cases, all 4 are normal printable chars, such as 0x29aa0, a CJK char, showing as "𩪠"

    If I cut the first 4 chars from Notepad++ to Thunderbird the result is "𐀀". I see only ð but the presence of 3 0-width chars is revealed by moving through the string with arrow keys. """ Here on Firefox the C1 controls, invisible in Thunderbird, display as squares with digits 0090, 0080 in two rows. Serhiy probably understands these reports better than I do. This tc in ADS4 Linux seems to doing something like what Serhiy described as "Tcl fails to decode the string from UTF-8 and falls back to Latin1" before his _tkinter fix.

    As far as IDLE and Linux is concerned, I am just going to consider what to change or add in "User output in Shell" in the IDLE doc.

    c948da86-b0cb-433d-9b97-c8752bacbd57 commented 4 years ago

    Further to the information I posted on Stack Overflow (referred to above) relating to reproducing emoticon characters from Idle under Ubuntu, I have done more testing. Based on some of the code/comments above, I tried modifications which I hoped might identify errors before Idle crashed. At a simple level I can generate some error information in a Ubuntu terminal from the following. usr/bin$ idle-python3.8 Entering chr(0x1f624) gives the following error message in terminal. X Error of failed request: BadLength (poly request too large or internal Xlib length error) Major opcode of failed request: 139 (RENDER) Minor opcode of failed request: 20 (RenderAddGlyphs) Serial number of failed request: 4484 Current serial number in output stream: 4484

    Another test used this code. --------------

    def FileSave(sav_file_name,outputstring):
        with open(sav_file_name, "a", encoding="utf8",newline='') as myfile:
            myfile.write(outputstring)
    
    def FileSave1(sav_file_name,eoutputstring):
        with open(sav_file_name, "a", encoding="utf8",newline='') as myfile:
            myfile.write(eoutputstring)
    
    tk = True
    if tk:
        from tkinter import Tk
        from tkinter.scrolledtext import ScrolledText
        root = Tk()
        text = ScrolledText(root, width=80, height=40)
        text.pack()
        def print1(txt):
            text.insert('insert', txt+'\n')
    
    errors = []
    outputstring = "Characters:"+ "\n"+"\n"
    eoutputstring = "Errors:"+ "\n"+"\n"
    
    #for i in range(0x1f600, 0x1f660):   #crashes at 0x1f624
    for i in range(0x1f623, 0x1f624):  # 1f624, 1f625 then try 1f652  
        chars = chr(i)
        decimal = str(int(hex(i)[2:],16))
        try:
            outputstring = str(hex(i))+" "+decimal+" "+chars+ "\n"
            FileSave("Charsfile.txt", outputstring)
            print1(f"{hex(i)} {decimal} {chars}")
            print(f"{hex(i)} {decimal} {chars}")
        except Exception as e:
            print(str(hex(i)))
            eoutputstring = str(hex(i))+ "\n"
            FileSave1("Errorfile.txt", eoutputstring)
            errors.append(f"{hex(i)} {e}")
    
    print("ERRORS:")
    
    for line in errors:
        print(line)

    With the range starting at 0x1f623 and changing the end point, in Ubuntu, with end point 0x1f624, this prints ok, but if higher numbers are used the Idle windows all closed. However on some occasions, if I began with end point at 0x1f624 and run, then without closing the editor window I increased the end point to 0x1f625, save and run, the Text window would close, but the console window would remain open. I could then increase the upper range further and repeat and more characters would print to the console. I have attached screenshots of the console output with the fonts-noto-color-emoji fonts package installed(with font), then with this package uninstalled (no font) and finally the same when run under Windows 10.
    For the console output produced while the font package is installed, if I select in the character column where there is a blank space, "something" can be selected. If I save the console as a text file or select all the rows, copy and paste to a text file, the missing characters are revealed. When the font package is uninstalled, the missing characters are truely missing. It is the apparently missing characters (such as 0x1f624, 0x1f62c, 0x1f641, 0x1f642, 0x1f644-0x1f64f) which appear to be causing the Idle crashes. Presumably such as 0x1f650 and 0x1f651 are unallocated codes so show up as rectangular outlines.

    In none of the tests with the more complex code above did I manage to generate any error output.

    My set up is as follows. Ubuntu 20.04.1 LTS x86_64 GNOME version: 3.36.3 Python 3.8.6 (default, Sep 25 2020, 21:22:01) Tk version: 8.6.10 [GCC 7.5.0] on linux

    Hopefully, the above might give some pointers to handling these characters.

    ronaldoussoren commented 4 years ago

    I've filed a Tk issue about this: https://core.tcl-lang.org/tk/tktview/f9fa926666d8e06972b5f0583b07a3c98eaac0a0

    What versions of Tk are used?

    c948da86-b0cb-433d-9b97-c8752bacbd57 commented 4 years ago

    On Ubuntu, Tk version is showing as 8.6.10 On Windows 10, Tk version is showing as 8.6.9

    ronaldoussoren commented 4 years ago

    The crash I had on macOS with tk 8.6.8 appears to be gone when using tk 8.6.10.

    What I got back was a SyntaxError when pasting a smiley emoji in an IDLE shell window when trying to type execute print("😀"). The SyntaxError message says: 'utf-8' codec can't encode characters in position 7-12: surrogates not allowed. That's likely to to how Tk represents this character in its text widget, and is something we could work around when converting Tcl/Tk strings to Python strings.

    Printing the emoji using 'print(chr(128516))' works fine.

    The scriptlet in msg380173 also works.

    terryjreedy commented 4 years ago

    Serhiy, does Ronald's report above re 8.6.10 on macOS suggest what might be needed to make print("😀") work on Mac? As I remember, your year-old _tkinter patch to make print(\<astral>) work on Linux and Windows converts Python strings differently on the two systems. But you did not know for sure what to do for macOS because nothing would work.

    ronaldoussoren commented 4 years ago

    Note that the main installers for Python 3.8 and 3.9 will continue to use Tk 8.6.8 due to problems when building later Tk version on macOS 10.9.

    The current plan is to add an installer variant to (amongst others) uses Tk 8.6.10 (and .11 when that's released).

    ronaldoussoren commented 4 years ago

    W.r.t. the SyntaxError I got (msg380552): It looks like it will be possible to work around that problem in _tkinter.c:unicodeFromTclStringAndSize by merging surrogate pairs.

    serhiy-storchaka commented 4 years ago

    Please open a new issue for "surrogates not allowed".

    ronaldoussoren commented 4 years ago

    I've filed bpo-42318 about the surrogate pairs error I mention in msg380552.

    terryjreedy commented 3 years ago
    I closed python/issues-test-cpython#43647 as a duplicate of this.  It reported that BMP chars can fail also.  For instance, with "Noto Sans Mono", but not 'Dejavu Mono', the following crash.
    >>> '\u2705'
    '✅'
    >>> '\u270f'
    '✏'

    Unfortunately, as least on some *nix, the default tkFixedFont resolves to Noto Sans Mono.

    serhiy-storchaka commented 3 years ago

    At least it is not a regression caused by support of astral characters (bpo-13153).

    terryjreedy commented 3 years ago

    No, seems strictly a matter of complicated color, which is perhaps becoming more common. Firefox colors the checkbox (white checkmark on green field in a largish black square) but not the (smaller) pencil. I did not recognize either the FF or tk Windows pencil as a pencil without any color (so I searched), so I won't be surprised if FF upgrades its pencil too.

    terryjreedy commented 3 years ago

    New changeset 1b4a9c7956d5dc64f8002f62bf0faae2d1892f90 by Terry Jan Reedy in branch 'master': bpo-42225: IDLE - document two unix-related problems. (bpo-25078) https://github.com/python/cpython/commit/1b4a9c7956d5dc64f8002f62bf0faae2d1892f90

    terryjreedy commented 3 years ago
    On macOS with 3.10.0a, 8.6.11 appears to fix this issue.
    >>> chr(128516)
    "😄"

    For IDLE, I am adding a paragraph to the doc. I will then close this issue as 'fixed' (insofar as we can for what is a 3rd party failure).

    miss-islington commented 3 years ago

    New changeset e92923b028024290a0e621b6b90e3221767d14d4 by Miss Islington (bot) in branch '3.8': bpo-42225: IDLE - document two unix-related problems. (GH-25078) https://github.com/python/cpython/commit/e92923b028024290a0e621b6b90e3221767d14d4

    miss-islington commented 3 years ago

    New changeset 84694c3e7adadc97d7d8cee938fe84bbeb961387 by Miss Islington (bot) in branch '3.9': bpo-42225: IDLE - document two unix-related problems. (GH-25078) https://github.com/python/cpython/commit/84694c3e7adadc97d7d8cee938fe84bbeb961387