py5coding / py5generator

Meta-programming project that creates the py5 library code.
https://py5coding.org/
GNU General Public License v3.0
52 stars 13 forks source link

Some non-ASCII characters on initial docstring crashing imported mode #107

Closed villares closed 2 years ago

villares commented 2 years ago

py5.version: '0.8.0a2' To reproduce:

"""
Á
"""

def setup():
    size(400, 400)
    rect(100, 100, 100, 100)

Resulting exception:

Traceback (most recent call last):
  File "C:\Users\alexandre.villares\AppData\Roaming\Python\Python39\site-packages\py5_tools\tools\run_sketch.py", line 52, in <module>
    main()
  File "C:\Users\alexandre.villares\AppData\Roaming\Python\Python39\site-packages\py5_tools\tools\run_sketch.py", line 44, in main
    imported.run_code(
  File "C:\Users\alexandre.villares\AppData\Roaming\Python\Python39\site-packages\py5_tools\imported.py", line 124, in run_code
    code = f.read()
  File "C:\Users\alexandre.villares\AppData\Local\Programs\Thonny\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 6: character maps to <undefined>

Surprisingly, Á will crash, but the á character won't crash...

I stumbled upon this because I have a sketch named "Árvore recursiva" (recursive tree) that crashes, while "Lousa mágica" (magic blackboard) didn't crash...

hx2A commented 2 years ago

Hmmm, interesting. Great find! This does need to work for folks who type comments in non-ascii characters.

I can't reproduce it on my Mac though. Is this a Windows thing? What happens if you put the code in a file and run it with run_sketch and not through Thonny? I don't understand that last line of the stack trace. Why does f.read() go to cp1252.py in the Thonny source code?

villares commented 2 years ago

Yeah! I was on a Windows (and at home on Linux I think it didn't happen). Strange stuff... could be a Thonny issue? I'll try to test more.

hx2A commented 2 years ago

Maybe a Thonny issue on Windows? Let me know what you figure out.

hx2A commented 2 years ago

I did some Googling because I was curious and I found a few things. This is an encoding issue, and maybe py5 needs to be explicit about reading and writing files with a particular encoding on Windows.

https://github.com/thonny/thonny/issues/2022 https://stackoverflow.com/questions/26324622/what-characters-do-not-directly-map-from-cp1252-to-utf-8

hx2A commented 2 years ago

I can now reproduce this, and I have a potential solution.

hx2A commented 2 years ago

My fix seems to work. I just needed to add encoding='utf8' to two locations.

I need to test it more and do some investigation to make sure there isn't a third place that also needs the same adjustment.

You have found a lot of bugs lately! This is appreciated. This latest bug is particularly important because it hinders non-english speakers from using fully using Thonny in their own language. BTW, this bug had nothing to do with comments or triple quoted strings. If Á appeared anywhere in the file, it wouldn't work. You can create variables or functions with Á in the name if you like.

hx2A commented 2 years ago

Hmmm, I have found a third place. If a json file contained an Á character and you tried to read it with py5.load_json(), you would get the same error.

This is going to require a closer look. I thought my Python encoding problems were over since I stopped using Python 2.7. Guess I was wrong!

hx2A commented 2 years ago

Python feature for v 3.15: https://peps.python.org/pep-0686/

We can't wait that long...working on a fix now.

hx2A commented 2 years ago

The translator utilities also suffered from this problem.

This seems to be fixed. I tested everything I can think of and can't find a way to break py5 with an 'Á' character.