Open daniel-kukiela opened 6 years ago
Thanks @daniel-kukiela , Your temporary fix worked
So we have another encoding-related issue you were attempring to fix.
Fix attempt of original issue broke trainer part, like i described above. I got that (encoding-related) issue wrong for first time when @sentdex showed it to me. The issue we are facing here is that some characters can't be encoded with current stdout encoding (and is true for Python versions < 3.6 using Windows, and maybe some installations of other OS-es).
stdout console encoding in Windows changed with Python 3.6 (https://www.python.org/dev/peps/pep-0528/) Issue using Python 3.5 on Windows 10: https://i.gyazo.com/907abdea295477595fa97bd0e56f220d.png
So i think, that better way to fix original issue is to change:
out_s = s.encode("utf-8")
if not isinstance(out_s, str):
out_s = out_s.decode("utf-8")
(lines 64-66 in utils/misc_utils.py, function name: print_out) to:
out_s = s.encode(sys.stdout.encoding, "backslashreplace"))
if not isinstance(out_s, str):
out_s = out_s.decode(sys.stdout.encoding, "backslashreplace"))
and stop assuming utf-8
as stdout encoding. That will also ensure, that every string will be printed out just fine.
Also, to fix issue caused by last commit (attempting to fix original issue): change:
utils.print_out(b" src: " + src_data[decode_id])
utils.print_out(b" ref: " + tgt_data[decode_id])
(lines 449-450 in train.py, function name: _sample_decode) to:
utils.print_out(b" src: " + src_data[decode_id].encode("utf-8"))
utils.print_out(b" ref: " + tgt_data[decode_id].encode("utf-8"))
Regards, Daniel
@daniel-kukiela Thanks for the comments. I reverted the previous attempt.
I think we may only need apply this change in your suggestion on the head to fix the issue:
(lines 64-66 in utils/misc_utils.py, function name: print_out)
out_s = s.encode(sys.stdout.encoding, "backslashreplace"))
if not isinstance(out_s, str):
out_s = out_s.decode(sys.stdout.encoding, "backslashreplace"))
Is this bug Windows specific? Have you tried it on other OS?
We experienced that issue only by using Windows OS. That bug will be a case for any supported OS and supported Python version where Python is using non-utf8 console encoding. I'm not sure if that's a case for any combination other than Windows + Python 3.5 (console encoding had been changed to utf8 for Python 3.6 for Windows: https://www.python.org/dev/peps/pep-0528/).
And yes - that one change will be sufficient now.
Regards, Daniel
Hi, After commit 58abf51aaf637962b0a5342afcd480af5cda7227 i'm unable to run training with error:
So... probably fix doesn't work? :)
Regards, Daniel