wxWidgets / Phoenix

wxPython's Project Phoenix. A new implementation of wxPython, better, stronger, faster than he was before.
http://wxpython.org/
2.31k stars 517 forks source link

wx.lib.wordwrap.wordwrap() raises error on multi-byte Unicode #2188

Open suurjaak opened 2 years ago

suurjaak commented 2 years ago

Operating system: Windows wxPython version & source: 4.1.1 msw (phoenix) wxWidgets 3.1.5 (pip-installed) Python version & source: stock 3.8.10

Description of the problem:

When given a string containing multi-byte Unicode characters like 😐 ("neutral face", chr(0x1F610)), wx.lib.wordwrap.wordwrap() raises IndexError.

Example:

import wx, wx.lib.wordwrap
app = wx.App()
wx.lib.wordwrap.wordwrap(chr(0x1F610), 100, wx.MemoryDC())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Program Files\Python3\lib\site-packages\wx\lib\wordwrap.py", line 36, in wordwrap
    if line[idx] == ' ':
IndexError: string index out of range

The problem is that dc.GetPartialTextExtents() returns multiple lengths for a single character:

>>> wx.MemoryDC().GetPartialTextExtents(chr(0x1F610))
[5, 11]
suurjaak commented 2 years ago

One workaround would be using GetTextExtent() instead.

In https://github.com/wxWidgets/Phoenix/blob/master/wx/lib/wordwrap.py#L27, replacing

        pte = dc.GetPartialTextExtents(line)

with this instead:

        pte = []
        for c in line:
            pte.append(dc.GetTextExtent(c).width + (pte[-1] if pte else 0))

I can make a pull request if this change would be acceptable.

(Sidenote: in Python2 GetPartialTextExtents() returned similarly multiple lengths, but since in Python2 strings the multi-byte characters actually got counted as multiple characters, this problem did not arise.)