Closed mlforcada closed 5 years ago
Yes, I also got that from my students using Windows. It seems to be fixed in Python 3.6 https://stackoverflow.com/a/32176732/610569 but from what I see there's still some issues. I would suggest that the encoding be added as one of the parameters for the different functions and utf-8 as the default.
Patching in a while...
@mlforcada Please try to upgrade the version of pip install -U sacremoses>=0.0.8
, the windows version should work fine too, I've added the .appveyor.yml
test on Windows systems (just in case).
Thanks a million, @alvations! It works as a charm. I'll check what you did and learn from it!
Great that it works now! Thanks @mlforcada for raising the issue =)
Hi, @alvations. A student of mine and I are using the truecaser from a Python 3 script in windows. The script makes sure that all files are opened in utf-8 by redefining open as follows:
However, when the script executes this excerpt of code:
The error occurs in the line
And the traceback is
We found that '\u011f' is the character 'ğ' in 'Erdoğan', position 4 in the word. Clearly, the error occurs when printing to a file
The file fout is opened as follows in
truecase.py
We have tried setting PYTHONIOENCODING before calling the script, to no avail.
Our python is:
and it was installed from This page. It also fails with 3.6.3 on a different machine. We are using it inside a virtualenv. We don't seem to find a way to solve this without modifying your code. Thanks a million for your help.