Not possible to continue training with hs=1...

piskvorky / gensim

Topic Modelling for Humans

https://radimrehurek.com/gensim

GNU Lesser General Public License v2.1

15.65k stars 4.37k forks source link

Not possible to continue training with hs=1... #2884

Open datistiquo opened 4 years ago

datistiquo commented 4 years ago

Hi, i came now multiple times across this error... Now I want to post it.

It seems not possible to continue training if hs=1.


  File "C:\Users\\Anaconda3\envs\tf2\lib\site-packages\gensim\models\fasttext.py", line 617, in build_vocab
    keep_raw_vocab=keep_raw_vocab, trim_rule=trim_rule, **kwargs)
  File "C:\Users\\Anaconda3\envs\tf2\lib\site-packages\gensim\models\base_any2vec.py", line 929, in build_vocab
    self.trainables.prepare_weights(self.hs, self.negative, self.wv, update=update, vocabulary=self.vocabulary)
  File "C:\Users\\Anaconda3\envs\tf2\lib\site-packages\gensim\models\fasttext.py", line 1021, in prepare_weights
    super(FastTextTrainables, self).prepare_weights(hs, negative, wv, update=update, vocabulary=vocabulary)
  File "C:\Users\\Anaconda3\envs\tf2\lib\site-packages\gensim\models\word2vec.py", line 1689, in prepare_weights
    self.update_weights(hs, negative, wv)
  File "C:\Users\\Anaconda3\envs\tf2\lib\site-packages\gensim\models\word2vec.py", line 1734, in update_weights
    self.syn1 = vstack([self.syn1, zeros((gained_vocab, self.layer1_size), dtype=REAL)])
AttributeError: 'FastTextTrainables' object has no attribute 'syn1'

example

model=FastText(sg=1,hs=1,min_n=5,max_n=5,workers=4,ns_exponent=0.75,iter=5,alpha=0.025,window=5,size=300,negative=10,min_count=1)
model.build_vocab(sentences=sentences)
total_examples = model.corpus_count
model.train(sentences=sentences, total_examples=total_examples, epochs=model.epochs)

model.build_vocab(sentnews, update=True)
total_examples = model.corpus_count
model.train(sentences=sentnews, total_examples=total_examples, epochs=5)

gojomo commented 4 years ago

You probably don't want to have both hs=1 and negative=10 – both modes enabled – at once. (Typically either one mode or the other dominates for a certain corpus/goal, with negative tending to perform better on larger corpuses.) Does using hs=1, negative=0 trigger the error?

Also, growing the vocabulary (thus changing the HS-mode word-encodings) while retaining some HS-layer trained values may be nonsensical - so I'd suspect use of the build_vocab(..., update=True) is likely to work better [with negative-sampling than HS mode].

Still, I wouldn't expect this error. What gensim version are you in, and can you make a completely self-contained test, with a tiny amount of dummy data, that reproduces the same error?

datistiquo commented 4 years ago

You probably don't want to have both hs=1 and negative=10 – both modes enabled – at once.

Why not? Isnt it hs with negative sampling?

Also, growing the vocabulary (thus changing the HS-mode word-encodings) while retaining some HS-layer trained values may be nonsensical - so I'd suspect use of the build_vocab(..., update=True) is likely to work better.

This I do not understand. What do you mean? Isn't it what I try to do? It is the structure for continue training from your docs.

gojomo commented 4 years ago

You probably don't want to have both hs=1 and negative=10 – both modes enabled – at once.

Why not? Isnt it hs with negative sampling?

No; the hs ('hierarchical softmax') and negative ('negative sampling') options are distinct methods, and the hs mode has no use for a negative parameter. If both are non-zero, two separate output networks are assembled, sharing the same input vectors, each being trained on (& being a source of backpropagated adjustments from) each text.

Also, growing the vocabulary (thus changing the HS-mode word-encodings) while retaining some HS-layer trained values may be nonsensical - so I'd suspect use of the build_vocab(..., update=True) is likely to work better.

This I do not understand. What do you mean? Isn't it what I try to do? It is the structure for continue training from your docs.

Sorry, that should have read "likely to work better with negative-sampling." Yes, the library allows vocab-expansion with HS mode, but it might not be a good idea compared to alternatives (like many other allowed operations/parameters).

duhaime commented 3 years ago

@gojomo I get the error reported by @datistiquo in this minimal Colab notebook, where I'm trying to retrain a pretrained model...

gojomo commented 3 years ago

Looking at that notebook, I see no such error - training appears to occur without error.

(Separately: it'd be good to correct the deprecation-warning that notebook is getting, as it suggests the exact new method that should be called instead. And, I personally would not assume such incremental-training of another model with a relatively-small amount of new data is a good idea. Any words/fragments well-represented in the new data might move arbitrarily-far to new coordinates, checked only by the offsetting influence of other words/fragments in the new data. A potentially much-larger number of words/fragments in the older data will stay put, and thus gradually lose meaningful comparability with your moved vectors. Unless doing some before-and-after quality checks on the whole model, over a larger range of probes than just your new data, there's no telling how large such an effect could be, or whether starting from someone else's model is a benefit or hindrance to your ultimate goals.)

duhaime commented 3 years ago

Ah sorry, I ended up correcting the syntax in the notebook!

The data here is just a toy example to show the syntax to a student. That said, I appreciate your words of wisdom on the deprecation warning @gojomo!

GYHHAHA commented 3 years ago

You probably don't want to have both hs=1 and negative=10 – both modes enabled – at once.

Why not? Isnt it hs with negative sampling?

No; the hs ('hierarchical softmax') and negative ('negative sampling') options are distinct methods, and the hs mode has no use for a negative parameter. If both are non-zero, two separate output networks are assembled, sharing the same input vectors, each being trained on (& being a source of backpropagated adjustments from) each text.

Also, growing the vocabulary (thus changing the HS-mode word-encodings) while retaining some HS-layer trained values may be nonsensical - so I'd suspect use of the build_vocab(..., update=True) is likely to work better.

This I do not understand. What do you mean? Isn't it what I try to do? It is the structure for continue training from your docs.

Sorry, that should have read "likely to work better with negative-sampling." Yes, the library allows vocab-expansion with HS mode, but it might not be a good idea compared to alternatives (like many other allowed operations/parameters).

Thanks for your explaination, I indeed feel confused when I first noticed both two keywords can be set with non-zero value. Maybe some notes on this can be added to docs to inform users that they actually use seperate steps instead of training a single model with both two methods combined.