plasticityai / magnitude

A fast, efficient universal vector embedding utility package.
MIT License
1.63k stars 120 forks source link

Recursion error #38

Closed jacobzweig closed 5 years ago

jacobzweig commented 5 years ago

I ran into an interesting RecursionError with a string in a corpus I was using recently:

text = [
 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
]
fasttext_embedding = Magnitude(
    fasttext_embedding_path, pad_to_length=500, pad_left=True
)
twitter_embedding = Magnitude(
    twitter_embedding_path, pad_to_length=500, pad_left=True
)
concatenated_embeddings = Magnitude(fasttext_embedding, twitter_embedding)
concatenated_embeddings.query(text)

Results in the following traceback:

  Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/lib/python3.6/site-packages/pymagnitude/third_party/repoze/lru/__init__.py", line 390
, in cached_wrapper
    val = func(*args, **kwargs)
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 2086, in query
    for i, m in enumerate(self.magnitudes)]
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 2086, in <listcomp>
    for i, m in enumerate(self.magnitudes)]
  File "/lib/python3.6/site-packages/pymagnitude/third_party/repoze/lru/__init__.py", line 390
, in cached_wrapper
    val = func(*args, **kwargs)
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 1219, in query
    vectors = self._vectors_for_keys_cached(q, normalized)
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 1107, in _vectors_for_keys
_cached
    unseen_keys[i], normalized, force=force)
  File "/lib/python3.6/site-packages/pymagnitude/third_party/repoze/lru/__init__.py", line 390, in cached_wrapper
    val = func(*args, **kwargs)
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 482, in _out_of_vocab_vector_cached
    return self._out_of_vocab_vector(*args, **kwargs)
  File "lib/python3.6/site-packages/pymagnitude/__init__.py", line 990, in _out_of_vocab_vector
    normalized=normalized) *
  File "lib/python3.6/site-packages/pymagnitude/__init__.py", line 753, in _db_query_similar_keys_vector
    key_stemmed = self._oov_stem(orig_key)
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 722, in _oov_stem
    return self._oov_english_stem_english_ixes(key)
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 715, in _oov_english_stem_english_ixes
    return self._oov_english_stem_english_ixes(stripped_key)
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 715, in _oov_english_stem_english_ixes
    return self._oov_english_stem_english_ixes(stripped_key)
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 715, in _oov_english_stem_english_ixes
    return self._oov_english_stem_english_ixes(stripped_key)
  [Previous line repeated 979 more times]
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 702, in _oov_english_stem_english_ixes
    if key_lower[:len(p)] == p:
RecursionError: maximum recursion depth exceeded in comparison

Any ideas on what might have caused this? Obviously the string should likely be removed, but curious why I ran into the error.

AjayP13 commented 5 years ago

Hey @jacobzweig, minor bug with a recursive algorithm for stemming. Thanks for reporting, I've pushed a fix (basically doesn't stem words beyond a certain length, since words that are clearly above a certain length are garbage input). I've pushed a fix, and will report back here once it passes the CI/CD pipeline and is deployed to PyPI.

AjayP13 commented 5 years ago

@jacobzweig this should now be fixed on version 0.1.119, use pip install pymagnitude -U to upgrade on Python 2.7 or pip3 install pymagnitude -U to upgrade on Python 3.

jacobzweig commented 5 years ago

Excellent, thanks @AjayP13!