svenkreiss / unicodeit

Converts LaTeX tags to unicode: \mathcal{H} → ℋ. Available on the web or as Automator script for the Mac.
https://www.unicodeit.net
Other
287 stars 34 forks source link

Tips about UnicodeEncodeError on Windows #70

Closed YDX-2147483647 closed 1 year ago

YDX-2147483647 commented 1 year ago

Note

This is not an issue of unicodeit, but people may meet it when using unicodeit. Therefore I think it's worth mentioning here.

> py -c 'print("\N{GREEK SMALL LETTER ALPHA}")'
α

> py -c 'print("\N{GREEK SMALL LETTER ALPHA}")' | echo  # or Write-Output
��

If the default encoding of Windows is not UTF-8 (say, GBK) and you've changed the shell encoding (as the following), you may meet UnicodeEncodeError when piping strings in PowerShell.

[console]::InputEncoding = [console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
> py -m unicodeit.cli '\alpha'
αⁿ

> py -m unicodeit.cli '\alpha' | Set-Clipboard  # or scb
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "~\AppData\Local\Programs\Python\Python311\Lib\site-packages\unicodeit\cli.py", line 8, in <module>
    print(' '.join(result))
UnicodeEncodeError: 'gbk' codec can't encode character '\u207f' in position 1: illegal multibyte sequence

To fix it, enable Python UTF-8 mode by command line option -X utf8 or environment variable PYTHONUTF8.

> py -X utf8 -m unicodeit.cli '\alpha^n' | echo
αⁿ

> $env:PYTHONUTF8 = '1'
> py -m unicodeit.cli '\alpha^n' | echo
αⁿ