grammalecte fails on Windows with Emacs using UTF-8 files

tmpUser2022 commented 2 years ago

grammalecte.py does not works by default with non ASCII UTF-8 files on Windows (because python uses the default encoding of OS).

The problem with flycheck-grammalecte is that it uses an UTF-8 char to indicate error (the char ⇨ U+21E8 RIGHTWARDS WHITE ARROW), and then, grammalecte.py crash trying to process the file.

It should probably either, force UTF-8 encoding, or remove UTF-8 chars to be multi-platform by defaults.

fix :

set PYTHONIOENCODING=utf-8
set PYTHONLEGACYWINDOWSSTDIO=utf-8

or in python

sys.stdin.reconfigure(encoding='utf-8')
sys.stdout.reconfigure(encoding='utf-8')

To reproduce the bug :

set PYTHONIOENCODING=windows-1252
set PYTHONLEGACYWINDOWSSTDIO=windows-1252

error message :

...
   print(msg)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64__qbz5n2kfra8p0\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u21e8' in position 45: character maps to <undefined>

tmpUser2022 commented 2 years ago

setting PYTHONIOENCODING to UTF8 solves the crash/allows to stat up grammalecte-flycheck but there is still a problem detecting UTF-8 chars as UTF-8 (I don't know if the problem is from gramalecte-flycheck or gramalecte.py).

mouse over gramma-1 gramma-2 right mouse click (both are present, "é" and his translation by grammalecte as "Ã") gramma-3

tmpUser2022 commented 2 years ago

On windows, using an UTF-8 file, and having set pythons vars set to UTF-8 : Adding encoding='utf8 at line around 140 of flycheck_grammalecte.py find_errors totally solves this issue.

def find_errors(input_file, opts={}):
    """Read the file and run grammalecte on it"""

    with open(input_file, "r", encoding='utf8') as f:

milouse commented 2 years ago

Thank you for debugging this. I must admit I don’t have any windows under the hand to test such behavior. Do you mind open a merge request with your proposal?

milouse commented 2 years ago

Sorry being very late on that. Now that I think again about it, I wonder if it would not be better to just use a specific ascii character instead of that utf-8 one? Like the #? I’ll make some tests with that approach.

milouse commented 2 years ago

In fact, no I won’t do that as I use other utf-8 chars to display nice arrows or non breakable spaces. So let go with your proposal.

milouse commented 2 years ago

Fixed by 5c96daa3d3ddb23a0d7576f64385ec33933eb3e5

milouse / flycheck-grammalecte

grammalecte fails on Windows with Emacs using UTF-8 files #18