Open lrt366 opened 5 years ago
I‘m so sorry that since the pre-trained word embeddings are very large, I have deleted it. But you can use the word2vec tool (using the default settings) to pre-train word embeddings on your machine using this corpus https://www.sogou.com/labs/resource/ftp.php?dir=/Data/SogouCA/SogouCA.tar.gz. For Chinese word segmentation, you can use pyltp, jieba or ansj.
I‘m so sorry that since the pre-trained word embeddings are very large, I have deleted it. But you can use the word2vec tool (using the default settings) to pre-train word embeddings on your machine using this corpus https://www.sogou.com/labs/resource/ftp.php?dir=/Data/SogouCA/SogouCA.tar.gz. For Chinese word segmentation, you can use pyltp, jieba or ansj.
thanks a lot!I'm going to try.
I‘m so sorry that since the pre-trained word embeddings are very large, I have deleted it. But you can use the word2vec tool (using the default settings) to pre-train word embeddings on your machine using this corpus https://www.sogou.com/labs/resource/ftp.php?dir=/Data/SogouCA/SogouCA.tar.gz. For Chinese word segmentation, you can use pyltp, jieba or ansj.
Hello, I'm very sorry to bother you again, is it wrong to use the Word2vec of Gensim library to train? I got this error with a trained model.
ValueError Traceback (most recent call last)
<ipython-input-19-c414348a3193> in <module>()
5 with open('model','rb')as f:
6 header = f.readline()
----> 7 vocab_size, layer1_size = map(int, header.split())
8 binary_len = np.dtype('float32').itemsize * layer1_size
9 while True:
ValueError: invalid literal for int() with base 10: b'\x80\x02cgensim.models.word2vec'
I guess it may be because my code can only read binary files. Did you save the model as a binary file? I used the original word2vec (implemented by C) tool to train and saved the model as a binary file.
thanks a lot!I will try again.
I guess it may be because my code can only read binary files. Did you save the model as a binary file? I used the original word2vec (implemented by C) tool to train and saved the model as a binary file.
Hello, I'm sorry I bothered you again, after I tried Python's Word2vec and used the original word2vec (implemented by C) tool to train and saved the model as a binary file.But it still doesn't run successfully.The error is as follows
I'm having a problem running the code.No such file or directory: '/home/wuch/chinese_word_new.bin' May I ask if this data is available?thanks!