wxWidgets / Phoenix

wxPython's Project Phoenix. A new implementation of wxPython, better, stronger, faster than he was before.
http://wxpython.org/
2.21k stars 509 forks source link

4-byte characters (emojis) in StaticText/TextCtrl/Button/ListCtrl/... labels or values cause string truncation #2537

Closed Remboooo closed 2 months ago

Remboooo commented 2 months ago

Operating system: Windows 11 23H2 wxPython version & source: pypi 4.2.1

>>> import wx
>>> print(wx.PlatformInfo)
('__WXMSW__', 'wxMSW', 'unicode', 'unicode-wchar', 'wx-assertions-on', 'phoenix', 'wxWidgets 3.2.2.1', 'autoidman', 'sip-6.7.9', 'build-type: release')

Python version & source: stock

python -VV
Python 3.11.7 (tags/v3.11.7:fa7a6f2, Dec  4 2023, 19:24:49) [MSC v.1937 64 bit (AMD64)]

Description of the problem: When value/label strings on controls contain emojis (4 byte characters), the string is truncated 1 byte too early for every such character in the string. This means that for every such character (e.g. emoji) in the string, a dummy character needs to be added to the end for the string not to be truncated.

Example: having labels with the following strings: "Test πŸ›" "Test πŸ›" "Test πŸ›πŸ›" "Test πŸ›πŸ›__" "πŸ›πŸ›πŸ› This should count to 5: 12345"

results in the following:

image

This happens at least for StaticText, TextCtrl, Button and ListCtrl. I haven't tried any others.

Code Example (click to expand) ```python import wx class TestFrame(wx.Frame): def __init__(self, parent): wx.Frame.__init__( self, parent, id=wx.ID_ANY, title="Test", pos=wx.DefaultPosition, size=wx.Size(500, 300), style=wx.DEFAULT_FRAME_STYLE | wx.TAB_TRAVERSAL ) sizer = wx.BoxSizer(wx.VERTICAL) text1 = wx.StaticText(self, wx.ID_ANY, u"Test πŸ›", wx.DefaultPosition, wx.DefaultSize, 0) sizer.Add(text1, 0, wx.ALL | wx.EXPAND, 5) text2 = wx.StaticText(self, wx.ID_ANY, u"Test πŸ›_", wx.DefaultPosition, wx.DefaultSize, 0) sizer.Add(text2, 0, wx.ALL | wx.EXPAND, 5) text3 = wx.StaticText(self, wx.ID_ANY, u"Test πŸ›πŸ›_", wx.DefaultPosition, wx.DefaultSize, 0) sizer.Add(text3, 0, wx.ALL | wx.EXPAND, 5) text4 = wx.StaticText(self, wx.ID_ANY, u"Test πŸ›πŸ›__", wx.DefaultPosition, wx.DefaultSize, 0) sizer.Add(text4, 0, wx.ALL | wx.EXPAND, 5) text5 = wx.StaticText(self, wx.ID_ANY, u"πŸ›πŸ›πŸ› This should count to 5: 12345", wx.DefaultPosition, wx.DefaultSize, 0) sizer.Add(text5, 0, wx.ALL | wx.EXPAND, 5) self.SetSizer(sizer) self.Layout() if __name__ == '__main__': app = wx.App() frm = TestFrame(None) frm.Show() app.MainLoop() ```
AmyAmy commented 2 months ago

This seems to work fine when using οΏ½ (U+FFFD). Which is the last displayable character which encodes to three UTF-8 codepoints, the last displayable character to encode to one UTF-16 codepoint, and the last displayable character on the BMP:

It breaks when using 𐀁 (U+10001). Which is the first character which encodes to four UTF-8 codepoints, and to two UTF-16 codepoints, and the first character not on the BMP:

(The font doesn't have glyph for this character, but that shouldn't matter for this issue).

swt2c commented 2 months ago

I believe this is a duplicate of #2446 (fixed in git). Please test the latest snapshots.