piskvorky / gensim

Topic Modelling for Humans
https://radimrehurek.com/gensim
GNU Lesser General Public License v2.1
15.59k stars 4.38k forks source link

TypeError: __randomstate_ctor() takes from 0 to 1 positional arguments but 2 were given #3492

Open arxyzan opened 1 year ago

arxyzan commented 1 year ago

Problem description

Hello, I recently had 3 gensim models (1 fasttext model and 2 word2vec models) saved using Word2Vec.save() and FastText.save() and I was able to load back all 3 models normally. But recently I attempted to upgrade a couple of packages like numpy and now I get the following error when trying to load back the models:

│ /home/aryan/source/hezarai/hezar/hezar/embeddings/word2vec.py:72 in build    │
│                                                                              │
│    69 │   │   │   if not os.path.isfile(vectors_path):                       │
│    70 │   │   │   │   raise ValueError(f"Could not load or find vectors file │
│    71 │   │   │   │   │   │   │   │    f"Please make sure it's been download │
│ ❱  72 │   │   │   embedding_model = word2vec.Word2Vec.load(embedding_path)   │
│    73 │   │   else:                                                          │
│    74 │   │   │   embedding_model = word2vec.Word2Vec(                       │
│    75 │   │   │   │   vector_size=self.config.vector_size,                   │
│                                                                              │
│ /home/aryan/Applications/miniconda3/envs/main/lib/python3.8/site-packages/ge │
│ nsim/models/word2vec.py:1939 in load                                         │
│                                                                              │
│   1936 │   │                                                                 │
│   1937 │   │   """                                                           │
│   1938 │   │   try:                                                          │
│ ❱ 1939 │   │   │   model = super(Word2Vec, cls).load(*args, **kwargs)        │
│   1940 │   │   │   if not isinstance(model, Word2Vec):                       │
│   1941 │   │   │   │   rethrow = True                                        │
│   1942 │   │   │   │   raise AttributeError("Model of type %s can't be loade │
│                                                                              │
│ /home/aryan/Applications/miniconda3/envs/main/lib/python3.8/site-packages/ge │
│ nsim/utils.py:486 in load                                                    │
│                                                                              │
│    483 │   │                                                                 │
│    484 │   │   compress, subname = SaveLoad._adapt_by_suffix(fname)          │
│    485 │   │                                                                 │
│ ❱  486 │   │   obj = unpickle(fname)                                         │
│    487 │   │   obj._load_specials(fname, mmap, compress, subname)            │
│    488 │   │   obj.add_lifecycle_event("loaded", fname=fname)                │
│    489 │   │   return obj                                                    │
│                                                                              │
│ /home/aryan/Applications/miniconda3/envs/main/lib/python3.8/site-packages/ge │
│ nsim/utils.py:1461 in unpickle                                               │
│                                                                              │
│   1458 │                                                                     │
│   1459 │   """                                                               │
│   1460 │   with open(fname, 'rb') as f:                                      │
│ ❱ 1461 │   │   return _pickle.load(f, encoding='latin1')  # needed because l │
│   1462                                                                       │
│   1463                                                                       │
│   1464 def revdict(d):                                                       │
╰──────────────────────────────────────────────────────────────────────────────╯
TypeError: __randomstate_ctor() takes from 0 to 1 positional arguments but 2 
were given

Process finished with exit code 1

I'm pretty sure that this is a version mismatch problem but I cannot find a way to find out which version I was using when saving those models :(

Reproduce

Reproducing the bug is only possible if you load the models. They reside on the HuggingFace Hub and you can load them using my library:

pip install hezar
from hezar import Embedding

word2vec = Embedding.load("hezarai/word2vec-cbow-fa-wikipedia")

Versions

Linux-5.15.0-79-generic-x86_64-with-glibc2.10 Python 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] Bits 64 NumPy 1.23.0 SciPy 1.9.1 gensim 4.2.0 FAST_VERSION 0

sauravm8 commented 11 months ago

Same issue, were you able to resolve it?

arxyzan commented 11 months ago

@sauravm8 The only solution was to roll back the versions of numpy and gensim to the same versions they were for training. Mine was numpy==1.24 and gensim==4.3.2