iit-cs585 / assignments

Assignments for IIT CS585
3 stars 7 forks source link

a4 word does not in vocabulary #31

Closed yfan000 closed 7 years ago

yfan000 commented 7 years ago

In part 1 of a4, we need to train a word2vec model on the sentences from the Brown corpus. For example: model = Word2Vec(sentences, size=100, window=5, min_count=5, workers=4)

Then, we can get 100 new features from the word2vec model by: model.wv['EU']

But I got the error, because word 'EU' is not in the vocabulary. ('EU' is the first word in training data.) KeyError: "word 'EU' not in vocabulary"

Do I implement this in the right way? If I am correct, we couldn't get the new features if a word does not exist in vocabulary. How can we deal with this problem?

aronwc commented 7 years ago

Only include the word2vec features for words that are in the word2vec vocabulary.

On Mon, Apr 10, 2017 at 9:23 PM, yfan000 notifications@github.com wrote:

In part 1 of a4, we need to train a word2vec model on the sentences from the Brown corpus. For example: model = Word2Vec(sentences, size=100, window=5, min_count=5, workers=4)

Then, we can get 100 new features from the word2vec model by: model.wv['EU']

But I got the error, because word 'EU' is not in the vocabulary. ('EU' is the first word in training data.) KeyError: "word 'EU' not in vocabulary"

Do I implement this in the right way? If I am correct, we couldn't get the new features if a word does not exist in vocabulary. How can we deal with this problem?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/iit-cs585/assignments/issues/31, or mute the thread https://github.com/notifications/unsubscribe-auth/ADv-hU9SceBiD7j55-FaqkWrVUy-3tbCks5ruuQggaJpZM4M5ilP .

yfan000 commented 7 years ago

I have two more questions regarding to apply the word2vec model.

  1. When we define the make_feature_dicts function, is w2v true by default? def make_feature_dicts(data, token=True, caps=True, pos=True, chunk=True, context=True, w2v=True)

  2. Should we use the lower case when we look for new features from the word2vec? model.wv['EU'] or model.wv['eu']

aronwc commented 7 years ago

When we define the make_feature_dicts function, is w2v true by default?

Yes.

Should we use the lower case when we look for new features from the word2vec?

No. model.wv['*EU*']

-Aron

On Tue, Apr 11, 2017 at 10:06 AM, yfan000 notifications@github.com wrote:

I have two more questions regarding to apply the word2vec model.

1.

When we define the make_feature_dicts function, is w2v true by default? def make_feature_dicts(data, token=True, caps=True, pos=True, chunk=True, context=True, w2v=True) 2.

Should we use the lower case when we look for new features from the word2vec? model.wv['EU'] or model.wv['eu']

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/iit-cs585/assignments/issues/31#issuecomment-293293502, or mute the thread https://github.com/notifications/unsubscribe-auth/ADv-hYLbZjAqRjhMBtNFS3rm9pqVJnFxks5ru5cEgaJpZM4M5ilP .

yfan000 commented 7 years ago

Thanks!