yandexdataschool / nlp_course

YSDA course in Natural Language Processing
https://lena-voita.github.io/nlp_course.html
MIT License
9.84k stars 2.61k forks source link

UnicodeDecodeError in ./week01_embeddings/seminar.ipynb #33

Closed Muhamob closed 5 years ago

Muhamob commented 5 years ago

Fix problem while opening in docker container.

Problem occurs while running ./week01_embeddings/seminar.ipynb in docker container. What i found in documentation is that Python 3 build-in function open(filename) by default set encoding parameter equal to the output of locale.getpreferredencoding(False) function. Which is in case of running in docker equal to 'ANSI_X3.4-1968'.

There are two solutions I know. The first one is to use encoding="utf-8" parameter in open() function. Another one is to set proper locale in Dockerfile

An output of locale in bash:

root@someid:~# locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
review-notebook-app[bot] commented 5 years ago

Check out this pull request on ReviewNB: https://app.reviewnb.com/yandexdataschool/nlp_course/pull/33

You'll be able to see visual diffs and write comments on notebook cells. Powered by ReviewNB.

justheuristic commented 5 years ago

Thank you! / We'll set the correct locale in docker by the next on-campus iteration /