Python 3 encoding issue

jasongaoks commented 4 years ago

I got the following error message when I'm using Python 3 and jianpu_ly to compile a jianpu file. ERROR: Unable to read file test.jp Environment: Win10 64bit, Python 3.7, console codepage 936（ANSI/OEM-简体中文 GBK), the input file with Chinese lyrics in it is UTF-8 encoded. I modified source code like this:

        if sys.version_info<(3,0):
            inDat.append(open(f).read())
        else:
            inDat.append(open(f, "r", encoding="utf-8").read())

and it can open and read the input file properly.

但是，我又发现保存的文件出现乱码。

需要在运行jianpu_ly.py之前，先

set PYTHONIOENCODING=utf-8

,才能输出正确结果。

Please help to solve this problem. Thanks in advance!

ssb22 commented 4 years ago

Yes, this is a major annoyance of trying to make the same code work in both Python 2 and Python 3. Python 2 reads/writes bytes, but Python 3 reads/writes Unicode characters, and it checks the system settings to decide the encoding for both reading and writing. In this case, the system locale says GBK, so Python3 will use GBK both to read and to write. And if you give Python3 some UTF-8 when it's expecting GBK, it will fail with some kind of DecodeError. So you managed to add encoding= to open() to make it read UTF-8, but it's still using GBK for print, so you get messed-up output because Python 3 is converting your UTF-8 into GBK. And I never noticed, because I only tried it on GNU/Linux and Mac systems that are set to use UTF-8 everywhere anyway.

As Lilypond itself always uses UTF-8, and jianpu-ly uses UTF-8 when run under Python 2, I think we have enough reasons to override Python 3's behaviour and get it to use UTF-8 throughout, even if some system setting is telling it to use GBK.

Please try version 1.33: https://github.com/ssb22/jianpu-ly/commit/8ebc98ff401fa05b3980b361089ab6613d5cf749 this one tells Python 3 to use UTF-8 for everything☺

jasongaoks commented 4 years ago

Cool! 1.33 works perfectly. Thanks!

ssb22 / jianpu-ly

Python 3 encoding issue #4