Open Manfed opened 6 years ago
It works for me on Python 3, could you try that?
I've modified process_wiki.py
file. I changed line 43 to output.write(" ".join(unicode(text)) + "\n")
After that the processing started without errors.
Good to hear! Just to check: are you running Python 2 or 3?
I didn't change anything in my config and in project files. My default python version is 2.7, didn't notice that earlier :) Probably if this will be run with Python 3 there will be no problems.
Cool, I've added a note to the code for future reference.
Processing with my way is finished, but results are strange. model_pl.word2vec.model.txt
file has only 42 vectors and most of them contains only 1 character. I'll try to run make
with python3.
BTW I'm doing this on MAC if this makes some difference :)
EDIT: I changed a python version with command alias python='python3'
, but now I'm getting the first error message.
Hm, that's weird: it does seem to work in Polish for me.
What's the result of:
$ python --version
Result is Python 3.6.3
Maybe is't something with my mac config?
EDIT: The same issue on the Ubuntu.
Hi, I tried to run the code for polish language, but after downloading a data from wiki I've got an error:
python process_wiki.py ./data/pl/plwiki-latest-pages-articles.xml.bz2 ./data/pl/wiki.pl.text 2018-01-16 09:51:36,820: INFO: Running process_wiki.py ./data/pl/plwiki-latest-pages-articles.xml.bz2 ./data/pl/wiki.pl.text Traceback (most recent call last): File "process_wiki.py", line 43, in <module> output.write(" ".join(text) + "\n") UnicodeEncodeError: 'ascii' codec can't encode character u'\u0119' in position 20: ordinal not in range(128) make: *** [data/pl/wiki.pl.text] Error 1
.Is there any way to train model on unicode characters, not ascii?