Closed yfan000 closed 7 years ago
Only include the word2vec features for words that are in the word2vec vocabulary.
On Mon, Apr 10, 2017 at 9:23 PM, yfan000 notifications@github.com wrote:
In part 1 of a4, we need to train a word2vec model on the sentences from the Brown corpus. For example: model = Word2Vec(sentences, size=100, window=5, min_count=5, workers=4)
Then, we can get 100 new features from the word2vec model by: model.wv['EU']
But I got the error, because word 'EU' is not in the vocabulary. ('EU' is the first word in training data.) KeyError: "word 'EU' not in vocabulary"
Do I implement this in the right way? If I am correct, we couldn't get the new features if a word does not exist in vocabulary. How can we deal with this problem?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/iit-cs585/assignments/issues/31, or mute the thread https://github.com/notifications/unsubscribe-auth/ADv-hU9SceBiD7j55-FaqkWrVUy-3tbCks5ruuQggaJpZM4M5ilP .
I have two more questions regarding to apply the word2vec model.
When we define the make_feature_dicts function, is w2v true by default? def make_feature_dicts(data, token=True, caps=True, pos=True, chunk=True, context=True, w2v=True)
Should we use the lower case when we look for new features from the word2vec? model.wv['EU'] or model.wv['eu']
When we define the make_feature_dicts function, is w2v true by default?
Yes.
Should we use the lower case when we look for new features from the word2vec?
No. model.wv['*EU*']
-Aron
On Tue, Apr 11, 2017 at 10:06 AM, yfan000 notifications@github.com wrote:
I have two more questions regarding to apply the word2vec model.
1.
When we define the make_feature_dicts function, is w2v true by default? def make_feature_dicts(data, token=True, caps=True, pos=True, chunk=True, context=True, w2v=True) 2.
Should we use the lower case when we look for new features from the word2vec? model.wv['EU'] or model.wv['eu']
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/iit-cs585/assignments/issues/31#issuecomment-293293502, or mute the thread https://github.com/notifications/unsubscribe-auth/ADv-hYLbZjAqRjhMBtNFS3rm9pqVJnFxks5ru5cEgaJpZM4M5ilP .
Thanks!
In part 1 of a4, we need to train a word2vec model on the sentences from the Brown corpus. For example: model = Word2Vec(sentences, size=100, window=5, min_count=5, workers=4)
Then, we can get 100 new features from the word2vec model by: model.wv['EU']
But I got the error, because word 'EU' is not in the vocabulary. ('EU' is the first word in training data.) KeyError: "word 'EU' not in vocabulary"
Do I implement this in the right way? If I am correct, we couldn't get the new features if a word does not exist in vocabulary. How can we deal with this problem?