This was discovered on Linux; I have no idea if it also affects Mac.
Sometimes tdsr or pyte interprets console output as something other than UTF-8.
$ python3
print('\xe4')
You'll hear tdsr say a umlaut, and that's what was printed.
Use the review keys to review the line of output, and you'll hear
something completely different: sigma.
It turns out that the byte 0xe4 is sigma in the old CP-437 character set.
Next:
print('\u0134')
You'll hear j circumflex, which is what was printed.
Review the line of output by character, and indeed, capital j circumflex
is what is there.
So it's as though for unicodes under 0x100,
pyte (or something else) is treating their least significant byte as a
character in CP-437 and then translating them to UTF8 to be spoken,
whereas unicode characters >= 0x100 are handled properly.
This was discovered on Linux; I have no idea if it also affects Mac.
Sometimes tdsr or pyte interprets console output as something other than UTF-8.
$ python3
You'll hear tdsr say a umlaut, and that's what was printed. Use the review keys to review the line of output, and you'll hear something completely different: sigma. It turns out that the byte 0xe4 is sigma in the old CP-437 character set.
Next:
You'll hear j circumflex, which is what was printed. Review the line of output by character, and indeed, capital j circumflex is what is there.
So it's as though for unicodes under 0x100, pyte (or something else) is treating their least significant byte as a character in CP-437 and then translating them to UTF8 to be spoken, whereas unicode characters >= 0x100 are handled properly.