Miscount in editor.get_selection() for Unicode other than plane 0

When a Unicode char with high ord() (e.g. many emojis) is included in the source text, get_selection() counts it as 2 instead of 1 (as len() correctly does). Text selected after the included high char is off by 1.

E.g. snippet:

import editor # 📎 = chr(0x1f4ce)
print(repr(editor.get_text()[slice(*editor.get_selection())]))

prints 'editor' when the first occurrence of editor (on the import line) is selected in the editor, but 'ditor.' for the other two (on the print line). I found this when brewing a little Unicode utility for the wrench menu and running into it.

I made a work-around at this Gist snippet.

This issue surfaces differently with editor.get_line_selection(), i.e. if the selection runs until the end of the editor file (which contains a high char), editor.get_line_selection() raises an IndexError.

By the way, this also seems to impact Pythonista's editor internals. If you:

position the cursor just before the high char,
select 1 char right (by external keyboard shift - right arrow),
and then delete (by external keyboard backspace),

half of the selected char is deleted (violating the atomicity of Unicode chars). This displays the edit text with a strange symbol and leaves the edit text in an illegal state, e.g. the print(...gettext()...) creates an exception on decoding it.

iPad, Python 3.6, latest beta, latest iOS.

omz / Pythonista-Issues

Miscount in editor.get_selection() for Unicode other than plane 0 #332