koaning / whatlies

Toolkit to help understand "what lies" in word embeddings. Also benchmarking!
https://koaning.github.io/whatlies/
Apache License 2.0
469 stars 50 forks source link

AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vocab' on Mac not windows #315

Open sinievanderben opened 3 years ago

sinievanderben commented 3 years ago

Hi I had a question I tried similar file with similar code on a Windows and a Mac. I only changed the path names but with the Mac I'm getting an error. I was wondering whether this is due to the fact that it is a Mac or if there is anything else I am overlooking?

In both cases I run the code in Jupyter Notebook in VS code.

Thanks in advance!

Windows: WhatsApp Image 2021-09-19 at 10 07 10

Mac:

image

full error on Mac:

AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vocab'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/folders/yv/1l5fy71n0sb4gr_jh8p2yj3r0000gn/T/ipykernel_58308/781142390.py in <module>
----> 1 embeddings = lang[metaDataWords[:100]]

/opt/miniconda3/envs/wl/lib/python3.9/site-packages/whatlies/language/_gensim_lang.py in __getitem__(self, query)
     94                 vec = np.zeros(self.kv.vector_size)
     95             return Embedding(query, vec)
---> 96         return EmbeddingSet(*[self[tok] for tok in query])
     97 
     98     def _prepare_queries(self, lower):

/opt/miniconda3/envs/wl/lib/python3.9/site-packages/whatlies/language/_gensim_lang.py in <listcomp>(.0)
     94                 vec = np.zeros(self.kv.vector_size)
     95             return Embedding(query, vec)
---> 96         return EmbeddingSet(*[self[tok] for tok in query])
     97 
     98     def _prepare_queries(self, lower):

/opt/miniconda3/envs/wl/lib/python3.9/site-packages/whatlies/language/_gensim_lang.py in __getitem__(self, query)
     90                 )
     91             try:
---> 92                 vec = np.sum([self.kv[q] for q in query.split(" ")], axis=0)
     93             except KeyError:
     94                 vec = np.zeros(self.kv.vector_size)

/opt/miniconda3/envs/wl/lib/python3.9/site-packages/whatlies/language/_gensim_lang.py in <listcomp>(.0)
     90                 )
     91             try:
---> 92                 vec = np.sum([self.kv[q] for q in query.split(" ")], axis=0)
     93             except KeyError:
     94                 vec = np.zeros(self.kv.vector_size)

/opt/miniconda3/envs/wl/lib/python3.9/site-packages/gensim/models/keyedvectors.py in __getitem__(self, entities)
    351         if isinstance(entities, string_types):
    352             # allow calls like trained_model['office'], as a shorthand for trained_model[['office']]
--> 353             return self.get_vector(entities)
    354 
    355         return vstack([self.get_vector(entity) for entity in entities])

/opt/miniconda3/envs/wl/lib/python3.9/site-packages/gensim/models/keyedvectors.py in get_vector(self, word)
    469 
    470     def get_vector(self, word):
--> 471         return self.word_vec(word)
    472 
    473     def words_closer_than(self, w1, w2):

/opt/miniconda3/envs/wl/lib/python3.9/site-packages/gensim/models/keyedvectors.py in word_vec(self, word, use_norm)
    457 
    458         """
--> 459         if word in self.vocab:
    460             if use_norm:
    461                 result = self.vectors_norm[self.vocab[word].index]

AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vocab'
koaning commented 3 years ago

I could have a look but it'd help if you shared a few things.

  1. Screenshots are a poor method of sharing code because it's impossible for me to copy/paste. Could you share the code in a codeblock?
  2. What version of python/whatlies are you running in both environments?
  3. What version of gensim are you running in both environments?
sinievanderben commented 3 years ago

Sorry! Here's the code:

lang = GensimLanguage('/Volumes/TOSHIBA EXT/PROVEE/stressdatasets/forwhatlies/glove.twitter.27B.25d_word2vec.kv')

# Load in file with meta data 
metaFile = '/Volumes/TOSHIBA EXT/tensflow/metadata25d.tsv'

metaData = open(metaFile, encoding="utf8") 
metaDataWords = metaData.readlines()

embeddings = lang[metaDataWords[:100]]

Mac: python 3.9.7, gensim 3.8.3 Windows: python 3.9.4, gensim 4.0.1

sinievanderben commented 3 years ago

Additional remark:

koaning commented 3 years ago

Also just to confirm. Do you have a Mac with an M1 chip?

sinievanderben commented 3 years ago

Yes that’s correct

koaning commented 3 years ago

I just tried running code on my M1 mac and I could not reproduce.

import gensim.downloader as api
from whatlies.language import GensimLanguage

# First download some vectors
wv = api.load('glove-twitter-25')
# [=====================] 100.0% 104.8/104.8MB downloaded
wv.save("glove-twitter-25.kv")

# Next, load in downloaded vectors.
lang = GensimLanguage("glove-twitter-25.kv")
lang[["hello", "world"]]

I don't have access to your embeddings though, so I'm wondering if there's something up with that. I'm using gensim==3.8.3 and whatlies==0.6.4.

koaning commented 3 years ago

Closing due to radio silence.

sharma18yash commented 2 years ago

I am facing the same error with gensim version == 3.8.3 on google colab

koaning commented 2 years ago

Can you confirm that this runs?

import gensim.downloader as api
from whatlies.language import GensimLanguage

# First download some vectors
wv = api.load('glove-twitter-25')
# [=====================] 100.0% 104.8/104.8MB downloaded
wv.save("glove-twitter-25.kv")

# Next, load in downloaded vectors.
lang = GensimLanguage("glove-twitter-25.kv")
lang[["hello", "world"]]
user683 commented 2 years ago

Hi, I am facing the same error with gensim==3.8.3, and I want to known you have fixed this bug?

koaning commented 2 years ago

Can you confirm if this runs? Could you also share your platform info?

import gensim.downloader as api
from whatlies.language import GensimLanguage

# First download some vectors
wv = api.load('glove-twitter-25')
# [=====================] 100.0% 104.8/104.8MB downloaded
wv.save("glove-twitter-25.kv")

# Next, load in downloaded vectors.
lang = GensimLanguage("glove-twitter-25.kv")
lang[["hello", "world"]]