minimalparts / nonce2vec

Incremental learning of word embeddings with context informativeness.
MIT License
95 stars 15 forks source link

some Issues when I train the model from scratch and test with the provided code #3

Closed willanxywc closed 5 years ago

willanxywc commented 6 years ago

Hi, some issues during I train the model from scratch :

  1. I run with the latest gensim but got a model that's incompatiable with your provided gensim here. When I run the test code, the folloing error comes:

ValueError: You must specify either total_examples or total_words, for proper alpha and progress calculations. The usual value is total_examples=model.corpus_count.

So which version of gensim do you use?

  1. Then I use the provided gensim to train the model from scratch, another error comes:

File "/home/disk2/jysun/gensim_vec/gensim/models/word2vec.py", line 572, in build_vocab report_values, pre_exist_words = self.scale_vocab(keep_raw_vocab=keep_raw_vocab, trim_rule=trim_rule, update=update) # trim by min_count & precalculate downsampling File "/home/disk2/jysun/gensim_vec/gensim/models/word2vec.py", line 731, in scale_vocab return report_values, pre_exist_words UnboundLocalError: local variable 'pre_exist_words' referenced before assignment What should I do with these errors?

minimalparts commented 6 years ago

Hm. So we submitted to EMNLP in April 2017, and used the early 2017 code, which was only in version 0.13 at the time. I'm afraid the gensim people then released several new versions very quickly. It was bad luck.

We're working on having a new version work with gensim 3.x, but until then I'm afraid there is not much I can suggest, short of using the older gensim or the pre-trained model. Sorry about that. I'll add a note to that effect on the README.

willanxywc commented 6 years ago

Thanks ~Then I may try to train with gensim 0.13. Could I bother to ask which exact version of gensim? since 0.13 has several versions from 0.13.0 to 0.13.4.

minimalparts commented 6 years ago

I hear from others that any 0.13.x will work. I believe we were using 0.13.3.

un-lock-me commented 6 years ago

I got this error AttributeError: 'Model' object has no attribute 'id2word' I was supposed it will be independent on the way we create the model. Do you have any idea of this?

Thanks,

minimalparts commented 6 years ago

Sorry for the delayed reply... When does the error occur? This sounds like a gensim problem... Are you using the 0.13.3 version?

ghost commented 5 years ago

@willanxywc

I think, you need to specify "total_examples" and "epochs" on the current version of gensim.

model.train([sentence], total_examples=model.corpus_count, epochs=model.iter)

Similar issue: https://github.com/linanqiu/word2vec-sentiments/issues/16

akb89 commented 5 years ago

You can also use the v2.0 release branch. We significantly refactored the code and it now works with gensim v3.4.x.