Open chinmayapancholi13 opened 7 years ago
Related: #900 (also #700, #775, #435).
In my opinion, to make this non-experimental would require some significant research into what the kinds of datasets & specific settings where it offers an advantage, and where it just spends time with little or negative benefit. Whether incremental training improves this kind of model is inherently very context-dependent.
(Personally I'd expect a system where all existing words/weights are frozen, and new word-vectors inferred in a process a bit like Doc2Vec inference, to be a more stable/defensible/error-resistant approach.)
Yes, two directions here -- 1) making it possible 2) determining whether it makes sense.
If we have 1), we could outsource 2) to all the people who are asking (perhaps mistakenly) for this feature. It's one of the most requested properties of 2vec, which probably reflects some common underlying need across many applications of 2vec.
We have (1), that's why my focus is on (2). And (2) is only possible after we either get a bunch of research/experimentation done, or manage to collect such results from other people. Until then, I believe the existing (1) "it's possible" feature needs lots of caveats/disclaimers that effectively discourage beginners from relying upon it.
Updation of vocabulary in
Word2Vec
model is experimental right now. Addressing this issue based on the discussion here would be useful at other places like addingpartial_fit()
function for the sklearn-API class for Word2Vec.