kitao / pyxel

A retro game engine for Python
MIT License
14.67k stars 824 forks source link

Fixed the issue of gbk encoding format compatibility and drive path on Windows when app2exe was executed #529

Closed EurynomeKeros closed 6 months ago

EurynomeKeros commented 6 months ago

521

Fixed the issue of gbk encoding format compatibility and drive path on Windows when app2exe was executed

EurynomeKeros commented 6 months ago

The error message:UnicodeDecodeError: 'gbk' codec can't decode byte 0xbb in position 515: illegal multibyte sequence,the content of the ".py" file may contain characters in non-GBK encoded formats such as Chinese,find the file reading method and add a compatible encoding format"utf8" The error message:ValueError: path is on mount 'C:', start on mount 'E:',it is possible that os.path.dirname and os.path.abspath was used in the ".py" file,the issue is triggered when the code is packaged

merwok commented 6 months ago

There is a function intended to open Python files respecting the encoding declaration: https://docs.python.org/3/library/tokenize.html#tokenize.open

kitao commented 6 months ago

So this modification could be written more simply?

merwok commented 6 months ago

Yes, it would be simpler (one line of code), faster (one try) and correct in more cases (supporting all encodings that Python supports, not only locale encoding and UTF-8 as in the current PR)

EurynomeKeros commented 6 months ago

I re-analyzed the encoding error(UnicodeDecodeError: 'gbk' codec can't decode byte 0xbb in position 515: illegal multibyte sequence).This is because the default encoding format of the Windows system is gbk.The default encoding format for Python3 is UTF8,when using the python reading method open(),if no encoding is specified, the default encoding format of the operating system will be used,causes the UTF8 file to be read using the gbk format.The default encoding for Mac and Linux is UTF8,so add encoding="utf8" when reading a file may be more compatible

EurynomeKeros commented 6 months ago

About #521 ,there are many possible causes,it looks like the program is trying to find a relative path between the two directories, and they don't exist because they're on different drives.This is usually due to the use of os.path.dirname and os.path.abspath in python code, but it can also be a Windows issue or a pyinstaller issue.This is hard to judge, so try an absolute path if finding a relative path fails

merwok commented 6 months ago

But Python modules can use many encodings, not only UTF-8. Why not use the function that is made for this?

EurynomeKeros commented 6 months ago

I've tried this suggestion, but it doesn't simply open the file with the encoding of the file,tokenize.open(filename) using the encoding detected by detect_encoding().This function is used to detect the encoding that should be used to decode a Python source file,it detects the encoding from the presence of a UTF-8 BOM or an encoding cookie,if no encoding is specified, then the default of 'utf-8' will be returned.So if the file encoding is not utf8, you must set the encoding cookie, otherwise the SyntaxError is wrong. It's just a default encoding conflict under Windows, specifying UTF8 may have less impact.Using tokenize.open requires a programmer to have good code specifications

merwok commented 6 months ago

if the file encoding is not utf8, you must set the encoding cookie

Absolutely ‑ this is required for a Python module. There is no default to platform default encoding, Python defines that the default is UTF-8. This is not about best practices, but base requirements.

kitao commented 6 months ago

Has this discussion been settled? I haven't had time to properly review it yet, but I would like to know if the current code is the best and most realistic solution.

EurynomeKeros commented 6 months ago

The current changes are working well in my project.The path issue just adds one more attempt.Regarding encoding, specifying the UTF8 encoding format is the most convenient change for me...

kitao commented 6 months ago

@EurynomeKeros Sorry, I checked the changes and I don't understand why abspath is used when relpath failed for avoiding a encoding conflict. Could you explain the reason clearly again?

EurynomeKeros commented 6 months ago

@kitao This is the issue mentioned in #521 ,which is also caused by Windows.Some versions of Windows, such as win10, will generate temporary files under C:\Users\xxx\AppData\Local\Temp when the application is executed.When you run the pyxel project, it is actually executed in this directory under the C: drive.Executing app2exeuses relpathto find the file, which is a relative path, if the pyxel project is on a drive other than the C: drive, such as the E: drive, and the from xxx import xxx syntax is used in the .py file,then it will cause an error ValueError: path is on mount 'C:', start on mount 'E:',because the relative path between the different drive paths cannot be provided.At this point, try to use abspathto provide an absolute path so that Pyxel can find the file

kitao commented 6 months ago

Thank you for your reply. Let me ask some questions. Q1: So relpath/abspath modification is nothing to do with the encoding issues? Q2: Try to use relpath first is necessary process? What happens if just replace relpath into abspath. Q3: relpath is used in _list_imported_modules function several times. Modifying one line is enough? Thank you.

EurynomeKeros commented 6 months ago

R1:The relpath/abspath modification has nothing to do with coding issues. R2,R3:Use relpath in the _list_imported_modules function to find the import and from...import the imported module in .py file.I replaced the relpath used by this function with abspath to make sure that the module was found.

kitao commented 6 months ago

Thank you. I'll try to use your modification.

merwok commented 6 months ago

The encoding change will cause issues on windows, where saving files does not default to utf-8.