about chinese_word_new.bin

wuch15 / SRV-DSA

source for Semi-supervised Dimensional Sentiment Analysis with Variational Autoencoder

4 stars 1 forks source link

about chinese_word_new.bin #1

Open lrt366 opened 5 years ago

lrt366 commented 5 years ago

I'm having a problem running the code.No such file or directory: '/home/wuch/chinese_word_new.bin' May I ask if this data is available?thanks!

wuch15 commented 5 years ago

I‘m so sorry that since the pre-trained word embeddings are very large, I have deleted it. But you can use the word2vec tool (using the default settings) to pre-train word embeddings on your machine using this corpus https://www.sogou.com/labs/resource/ftp.php?dir=/Data/SogouCA/SogouCA.tar.gz. For Chinese word segmentation, you can use pyltp, jieba or ansj.

lrt366 commented 5 years ago

I‘m so sorry that since the pre-trained word embeddings are very large, I have deleted it. But you can use the word2vec tool (using the default settings) to pre-train word embeddings on your machine using this corpus https://www.sogou.com/labs/resource/ftp.php?dir=/Data/SogouCA/SogouCA.tar.gz. For Chinese word segmentation, you can use pyltp, jieba or ansj.

thanks a lot!I'm going to try.

lrt366 commented 5 years ago

I‘m so sorry that since the pre-trained word embeddings are very large, I have deleted it. But you can use the word2vec tool (using the default settings) to pre-train word embeddings on your machine using this corpus https://www.sogou.com/labs/resource/ftp.php?dir=/Data/SogouCA/SogouCA.tar.gz. For Chinese word segmentation, you can use pyltp, jieba or ansj.

Hello, I'm very sorry to bother you again, is it wrong to use the Word2vec of Gensim library to train? I got this error with a trained model.

ValueError                                Traceback (most recent call last)
<ipython-input-19-c414348a3193> in <module>()
      5 with open('model','rb')as f:
      6     header = f.readline()
----> 7     vocab_size, layer1_size = map(int, header.split())
      8     binary_len = np.dtype('float32').itemsize * layer1_size
      9     while True:

ValueError: invalid literal for int() with base 10: b'\x80\x02cgensim.models.word2vec'

wuch15 commented 5 years ago

I guess it may be because my code can only read binary files. Did you save the model as a binary file? I used the original word2vec (implemented by C) tool to train and saved the model as a binary file.

lrt366 commented 5 years ago

thanks a lot!I will try again.

lrt366 commented 5 years ago

I guess it may be because my code can only read binary files. Did you save the model as a binary file? I used the original word2vec (implemented by C) tool to train and saved the model as a binary file.

Hello, I'm sorry I bothered you again, after I tried Python's Word2vec and used the original word2vec (implemented by C) tool to train and saved the model as a binary file.But it still doesn't run successfully.The error is as follows