lvapeab / nmt-keras

Neural Machine Translation with Keras
http://nmt-keras.readthedocs.io
MIT License
532 stars 130 forks source link

Error in preprocessing embeddings #108

Closed VP007-py closed 4 years ago

VP007-py commented 5 years ago

After running

python preprocess_binary_word_vectors.py  -v /scratch/cc.hi.300.bin  -d word2vec.hi

I am getting the following log

Loading vectors from /scratch/cc.hi.300.bin
Traceback (most recent call last):
  File "preprocess_binary_word_vectors.py", line 61, in <module>
    word2vec2npy(args.vectors, base_path, dest_file)
  File "preprocess_binary_word_vectors.py", line 22, in word2vec2npy
    vocab_size, layer1_size = map(int, header.split())
ValueError: invalid literal for int() with base 10: '\xba\x16O/'

The text format of the embeddings works out well as fixed in #60

lvapeab commented 4 years ago

You'll need to load them using the FastText tools (e.g. https://github.com/facebookresearch/fastText/issues/52) and process them similarly as in preprocess_binary_word_vectors.py. Cheers.