Charset encoding - Githubissues

leoaday / word2vec

Automatically exported from code.google.com/p/word2vec

Apache License 2.0

0 stars 0 forks source link

Charset encoding #19

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago

Ran and compiled on Ubuntu 14.04.

The input file is encoded in UTF-8 but output files (vocabulary and text 
vectors file) are encoded in ISO-8859-1.

All accents are wrong.

Original issue reported on code.google.com by pierpaol...@gmail.com on 10 Sep 2014 at 5:33

GoogleCodeExporter commented 8 years ago

confirmed on same os with python3

Original comment by jonatha...@gmail.com on 13 Mar 2015 at 1:13