python / cpython

The Python programming language
https://www.python.org
Other
63.12k stars 30.22k forks source link

IDLE: Unexpected behavior caused by the input method #125349

Open Xiaokang2022 opened 1 week ago

Xiaokang2022 commented 1 week ago

Bug report

Bug description:

Entering "r", "t" or "p" in the Chinese input method and press Enter in the editor will cause IDLE to behave unexpectedly.

There may be other similar situations, and I will list three of them here. The input method I used for my test is native to the Windows operating system.

I'm not sure if this problem is caused by the input method or the tkinter event binding, maybe it's not an idle issue, but a tkinter issue. If this issue is not resolved, it will greatly affect the experience of using input methods in IDLE.

The following is a gif demonstration that reproduces the issue:

2024-10-12-15-06-39

The event when this behavior is triggered and some of its properties are as follows:

letter event.char event.keysym event.keycode event.keysym_num
r "r" F3 114 65472
t "t" F5 116 65474
p "p" F1 112 65470

It is clear that this is not the expected behavior. event.keysym does not match event.char.

CPython versions tested on:

3.13

Operating systems tested on:

Windows

Linked PRs

Wulian233 commented 1 week ago

This bug is very noticeable for Chinese users, and there are many reports of it

https://www.bilibili.com/read/cv25209738/ https://github.com/program-in-chinese/overview/issues/156

The usual solution is to change the shortcut key settings by going to "Options" → "Configure IDLE" → "Keys", but this does not solve the underlying problem. There is also a Chinese translation package for IDLE on pypi and also fixed this bug. https://github.com/zetaloop/IDLE-CN

See https://github.com/zetaloop/IDLE-CN-base/commit/83b7d23627c4c51c3c871948e42df8a2ca57c4de (editd)

Last, I think it's a great idea to make Idle support internationalization.

https://github.com/python/cpython/issues/61976

Xiaokang2022 commented 1 week ago

Actually, based on my description of the issue above, this can be fixed by just detecting and filtering out events that event.keysym doesn't match event.char.

Xiaokang2022 commented 1 week ago

Logically unlikely, this is not possible, but there is such a mismatch in tkinter.

Wulian233 commented 1 week ago

pr welcome

terryjreedy commented 1 week ago

EDITED: There seems to be a bug somewhere in the Windows IME, tcl/tk, or tkinter resulting in buggy events. @serhiy-storchaka, do you know anything about this?

Events, especially key event attributes, are under-documented. The relevant tk event fields are:

-keycode number Number must be an integer; it specifies the keycode field for the event. Corresponds to the %k substitution for binding scripts. -keysym name Name must be the name of a valid keysym, such as g, space, or Return; its corresponding keycode value is used as the keycode field for event, overriding any detail specified in the base event argument. Corresponds to the %K substitution for binding scripts.

I believe keycode was originally a number identifying a physical key or possibly a pair of duplicate keys on a particular keyboard and OS. Not standardized and nearly useless for users, but see below. On my keyboard, keycodes for number and letter keys are the ascii codes (of the uppercase for letters). Other keys are all under 256.

The keysym names recognized by tk are listed here, with numbers. Note that numbers for non-Latin1 chars have no relation to their codepoints. There are only a few thousand, and in particular, CJK chars are not listed. Only recognized keysyms can be used in binding sequences, like '', that trigger events. The % subs are for tcl scripts, which are text code, not function names.

The binding page adds two more 'calculated' values for substitution:

%A Substitutes the UNICODE character corresponding to the event, or the empty string if the event does not correspond to a UNICODE character. %N The keysym corresponding to the event, substituted as a decimal number.

Tkinter instead adds 'char' and 'keysym_num' event attributes. They are all listed with very brief descriptions in the class Event docstring in tkinter/__init__.py (line 217 currently). There is only an __repr__ method there, so the real event creation code must be in _tkinter.c.

To investigate, I used the following code.

import tkinter as tk

r = tk.Tk()
t = tk.Text(r)
t.pack()

def tvent(ev):
    print(f'({ev.char=}, {ev.keycode=}, {ev.keysym=}, {ev.keysym_num=}')
t.bind('<KeyRelease>', tvent)

After preliminary experiments, which contributed to the above, I installed the Chinese (simplified, Chine) IME. It apparently worked fine. I entered p, r, t; selected the first suggested chinese word; and it replaced 'prt' via generated events.

(ev.char='p', ev.keycode=80, ev.keysym='p', ev.keysym_num=112
(ev.char='r', ev.keycode=82, ev.keysym='r', ev.keysym_num=114
(ev.char='t', ev.keycode=84, ev.keysym='t', ev.keysym_num=116
(ev.char='膨', ev.keycode=33192, ev.keysym='??', ev.keysym_num=0
(ev.char='润', ev.keycode=28070, ev.keysym='??', ev.keysym_num=0
(ev.char='土', ev.keycode=22303, ev.keysym='??', ev.keysym_num=0

I suspect the keycodes for non-Latin1 chars are their codepoints; can some who recognized the chars check?

I also installed Chinese (traditional Taiwan) but after an hour Microsoft still says 'Not ready yet' and indeed selecting it has no effect in IDLE. Is this the one used to get the result given above? Maybe I will try rebooting or re-installing tomorrow.

Windows 10 with October update.

Xiaokang2022 commented 1 week ago

Entering Chinese directly does not trigger this problem, but directly entering English in Chinese mode causes this problem.

Take the above code as an example, when making sure that the input method is in Chinese mode, press the following key:

rEnter

Doing so in other editors will only enter the letter "r", however, nothing is entered in the IDLE and the "F3" is pressed at the same time. In the example code above, the letters will still be entered normally.

import tkinter as tk

r = tk.Tk()
t = tk.Text(r)
t.pack()

def tvent(ev):
    print(f'({ev.char=}, {ev.keycode=}, {ev.keysym=}, {ev.keysym_num=}, {ev.type=}, {ev.send_event=}, {ev.widget=}')
t.bind('<KeyRelease>', tvent)

r.mainloop()

Follow the code above and press the button mentioned above, and the output will look like this:

(ev.char='r', ev.keycode=82, ev.keysym='r', ev.keysym_num=114, ev.type=<EventType.KeyRelease: '3'>, ev.send_event=False, ev.widget=<tkinter.Text object .!text>
(ev.char='r', ev.keycode=114, ev.keysym='F3', ev.keysym_num=65472, ev.type=<EventType.KeyRelease: '3'>, ev.send_event=True, ev.widget=<tkinter.Text object .!text>
(ev.char='\r', ev.keycode=13, ev.keysym='Return', ev.keysym_num=65293, ev.type=<EventType.KeyRelease: '3'>, ev.send_event=False, ev.widget=<tkinter.Text object .!text>

It is obvious that the results of the second line are problematic.

OS: Windows 11

serhiy-storchaka commented 1 week ago
Correspondence between the Event attributes and Tk substitutions: Attribute Tk Description
char %A Substitutes the UNICODE character corresponding to the event, or the empty string if the event does not correspond to a UNICODE character (e.g. the shift key was pressed). On X11, XmbLookupString (or XLookupString when input method support is turned off) does all the work of translating from the event to a UNICODE character. On X11, valid only for KeyPress event. On Windows and macOS/aqua, valid only for KeyPress and KeyRelease events.
keycode %k The keycode field from the event. Valid only for KeyPress and KeyRelease events.
keysym %K The keysym corresponding to the event, substituted as a textual string. Valid only for KeyPress and KeyRelease events.
keysym_num %N The keysym corresponding to the event, substituted as a decimal number. Valid only for KeyPress and KeyRelease events.
serhiy-storchaka commented 1 week ago

the real event creation code is in _substitute().

serhiy-storchaka commented 1 week ago

Please output also ev.type. Are they all KeyRelease?

Xiaokang2022 commented 1 week ago

I edited the previous comment, outputting ev.type

serhiy-storchaka commented 1 week ago

Thank you. Could you please output also ev.send_event and ev.widget?

Xiaokang2022 commented 1 week ago

I edited the previous comment, outputting ev.send_event and ev.widget. I've also noticed that the ev.send_event of the line in question is different from the others.

serhiy-storchaka commented 1 week ago

Aha! The difference is that send_event=True for this stray F3. Perhaps we can use this to filter out such events in IDLE.

Xiaokang2022 commented 1 week ago

I'll try to change this way in #125352 . However, the approach I took in the PR so far is just as effective, but it is not a good solution.

Xiaokang2022 commented 1 week ago

This method doesn't seem to work for IDLE. This is because the value of ev.send_event for all shortcut action events is True in IDLE.

Here's how I tested:

Change the function wrapper in the #125352 to something like this:

def wrapper(event):
    # Filter out incorrect events
    if event.type == tkinter.EventType.Key and event.send_event:
        print(f"{event.send_event=}")
        return None
    return func(event)

Then you will find that the problem mentioned in this issue is gone, but the original shortcut cannot be triggered.

FYI, maybe there's a problem with the way I'm testing.

Xiaokang2022 commented 3 days ago

After testing, other third-party input methods can also cause the same problem. So, this problem may not be caused by the input method.

Also, one thing to note here is that we actually only pressed the keyboard twice, but the event was triggered three times.

rEnter

The two keystrokes correspond to the 1st and 3rd events mentioned earlier, respectively, but it's not clear how the intermediate event fires.