s4sarath / word2vec_supervised

A supervised approach for word2vec
4 stars 1 forks source link

KeyError #4

Open VuceWillis opened 8 years ago

VuceWillis commented 8 years ago

So, I was trying out an own implementation where I used the following label_dict: label_dict = {"good": ["pos_word"], "awesome": ["pos_word"], "great": ["pos_word"], "bad": ["neg_word"], "horrible": ["neg_word"], "terrible": ["neg_word"]}

Running:

Set values for various parameters

num_features = 300 # Word vector dimensionality
min_word_count = 50 # Minimum word count
num_workers = 4 # Number of threads to run in parallel context = 8 # Context window size
downsampling = 1e-3 # Downsample setting for frequent words

Initialize and train the model

model = Word2Vec_Supervised(sentences, hs=0, workers=num_workers, size=num_features, min_count=min_word_count, window=context, sample=downsampling, label_dict=label_dict)

Gave me the following KeyError: Exception in thread Thread-4: Traceback (most recent call last): File "/home/vuk/anaconda2/lib/python2.7/threading.py", line 801, in bootstrap_inner self.run() File "/home/vuk/anaconda2/lib/python2.7/threading.py", line 754, in run self.__target(_self.args, *_self.__kwargs) File "/home/vuk/anaconda2/lib/python2.7/site-packages/gensim-0.12.4-py2.7-linux-x86_64.egg/gensim/models/word2vec_supervised.py", line 529, in worker_train job_words = self._get_job_words(alpha, work, job, neu1) File "/home/vuk/anaconda2/lib/python2.7/site-packages/gensim-0.12.4-py2.7-linux-x86_64.egg/gensim/models/word2vec_supervised.py", line 488, in _get_job_words a1 = sum(train_sentence_sg_categ_nogil(self, sentence, alpha, work) for sentence in job) File "/home/vuk/anaconda2/lib/python2.7/site-packages/gensim-0.12.4-py2.7-linux-x86_64.egg/gensim/models/word2vec_supervised.py", line 488, in a1 = sum(train_sentence_sg_categ_nogil(self, sentence, alpha, work) for sentence in job) File "word2vec_inner_supervised.pyx", line 903, in word2vec_inner_supervised.train_sentence_sg_categ_nogil (./gensim/models/word2vec_inner_supervised.c:8797) KeyError: 'pos_word'

Any idea why this would happen?

s4sarath commented 8 years ago

Put min_count = 1 , and try .

model = Word2Vec_Supervised(sentences, hs=0, workers=num_workers, size=num_features, min_count=min_word_count, window=context, sample=downsampling, label_dict=label_dict)

On Fri, Jun 10, 2016 at 2:07 PM, VuceWillis notifications@github.com wrote:

So, I was trying out an own implementation where I used the following label_dict: label_dict = {"good": ["pos_word"], "awesome": ["pos_word"], "great": ["pos_word"], "bad": ["neg_word"], "horrible": ["neg_word"], "terrible": ["neg_word"]}

Running: Set values for various parameters

num_features = 300 # Word vector dimensionality

min_word_count = 50 # Minimum word count

num_workers = 4 # Number of threads to run in parallel context = 8 # Context window size

downsampling = 1e-3 # Downsample setting for frequent words Initialize and train the model

model = Word2Vec_Supervised(sentences, hs=0, workers=num_workers, size=num_features, min_count=min_word_count, window=context, sample=downsampling, label_dict=label_dict)

Gave me the following KeyError: Exception in thread Thread-4: Traceback (most recent call last): File "/home/vuk/anaconda2/lib/python2.7/threading.py", line 801, in bootstrap_inner self.run() File "/home/vuk/anaconda2/lib/python2.7/threading.py", line 754, in run self.__target(_self.args, *_self.__kwargs) File "/home/vuk/anaconda2/lib/python2.7/site-packages/gensim-0.12.4-py2.7-linux-x86_64.egg/gensim/models/word2vec_supervised.py", line 529, in worker_train job_words = self._get_job_words(alpha, work, job, neu1) File "/home/vuk/anaconda2/lib/python2.7/site-packages/gensim-0.12.4-py2.7-linux-x86_64.egg/gensim/models/word2vec_supervised.py", line 488, in _get_job_words a1 = sum(train_sentence_sg_categ_nogil(self, sentence, alpha, work) for sentence in job) File "/home/vuk/anaconda2/lib/python2.7/site-packages/gensim-0.12.4-py2.7-linux-x86_64.egg/gensim/models/word2vec_supervised.py", line 488, in a1 = sum(train_sentence_sg_categ_nogil(self, sentence, alpha, work) for sentence in job) File "word2vec_inner_supervised.pyx", line 903, in word2vec_inner_supervised.train_sentence_sg_categ_nogil (./gensim/models/word2vec_inner_supervised.c:8797) KeyError: 'pos_word'

Any idea why this would happen?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/s4sarath/word2vec_supervised/issues/4, or mute the thread https://github.com/notifications/unsubscribe/AKJPKKUJRvMExwEdAifTEdi8GApjZN13ks5qKSJZgaJpZM4IyvHW .

VuceWillis commented 8 years ago

That indeed solves the KeyError issue. However, it's printing "Inside the Cython categ Function" a lot now, which causes my notebook to (not crash) but almost crash (I can hardly interrupt the kernel). Maby an idea to leave that print statement out?

s4sarath commented 8 years ago

Sorry for the late reply . I will leave that print statement commented .