Open datistiquo opened 4 years ago
You probably don't want to have both hs=1
and negative=10
– both modes enabled – at once. (Typically either one mode or the other dominates for a certain corpus/goal, with negative
tending to perform better on larger corpuses.) Does using hs=1, negative=0
trigger the error?
Also, growing the vocabulary (thus changing the HS-mode word-encodings) while retaining some HS-layer trained values may be nonsensical - so I'd suspect use of the build_vocab(..., update=True)
is likely to work better [with negative-sampling than HS mode].
Still, I wouldn't expect this error. What gensim version are you in, and can you make a completely self-contained test, with a tiny amount of dummy data, that reproduces the same error?
You probably don't want to have both
hs=1
andnegative=10
– both modes enabled – at once.
Why not? Isnt it hs with negative sampling?
Also, growing the vocabulary (thus changing the HS-mode word-encodings) while retaining some HS-layer trained values may be nonsensical - so I'd suspect use of the
build_vocab(..., update=True)
is likely to work better.
This I do not understand. What do you mean? Isn't it what I try to do? It is the structure for continue training from your docs.
You probably don't want to have both
hs=1
andnegative=10
– both modes enabled – at once.Why not? Isnt it hs with negative sampling?
No; the hs
('hierarchical softmax') and negative
('negative sampling') options are distinct methods, and the hs
mode has no use for a negative
parameter. If both are non-zero, two separate output networks are assembled, sharing the same input vectors, each being trained on (& being a source of backpropagated adjustments from) each text.
Also, growing the vocabulary (thus changing the HS-mode word-encodings) while retaining some HS-layer trained values may be nonsensical - so I'd suspect use of the
build_vocab(..., update=True)
is likely to work better.This I do not understand. What do you mean? Isn't it what I try to do? It is the structure for continue training from your docs.
Sorry, that should have read "likely to work better with negative-sampling." Yes, the library allows vocab-expansion with HS mode, but it might not be a good idea compared to alternatives (like many other allowed operations/parameters).
@gojomo I get the error reported by @datistiquo in this minimal Colab notebook, where I'm trying to retrain a pretrained model...
Looking at that notebook, I see no such error - training appears to occur without error.
(Separately: it'd be good to correct the deprecation-warning that notebook is getting, as it suggests the exact new method that should be called instead. And, I personally would not assume such incremental-training of another model with a relatively-small amount of new data is a good idea. Any words/fragments well-represented in the new data might move arbitrarily-far to new coordinates, checked only by the offsetting influence of other words/fragments in the new data. A potentially much-larger number of words/fragments in the older data will stay put, and thus gradually lose meaningful comparability with your moved vectors. Unless doing some before-and-after quality checks on the whole model, over a larger range of probes than just your new data, there's no telling how large such an effect could be, or whether starting from someone else's model is a benefit or hindrance to your ultimate goals.)
Ah sorry, I ended up correcting the syntax in the notebook!
The data here is just a toy example to show the syntax to a student. That said, I appreciate your words of wisdom on the deprecation warning @gojomo!
You probably don't want to have both
hs=1
andnegative=10
– both modes enabled – at once.Why not? Isnt it hs with negative sampling?
No; the
hs
('hierarchical softmax') andnegative
('negative sampling') options are distinct methods, and thehs
mode has no use for anegative
parameter. If both are non-zero, two separate output networks are assembled, sharing the same input vectors, each being trained on (& being a source of backpropagated adjustments from) each text.Also, growing the vocabulary (thus changing the HS-mode word-encodings) while retaining some HS-layer trained values may be nonsensical - so I'd suspect use of the
build_vocab(..., update=True)
is likely to work better.This I do not understand. What do you mean? Isn't it what I try to do? It is the structure for continue training from your docs.
Sorry, that should have read "likely to work better with negative-sampling." Yes, the library allows vocab-expansion with HS mode, but it might not be a good idea compared to alternatives (like many other allowed operations/parameters).
Thanks for your explaination, I indeed feel confused when I first noticed both two keywords can be set with non-zero value. Maybe some notes on this can be added to docs to inform users that they actually use seperate steps instead of training a single model with both two methods combined.
Hi, i came now multiple times across this error... Now I want to post it.
It seems not possible to continue training if hs=1.