plasticityai / magnitude

A fast, efficient universal vector embedding utility package.
MIT License
1.63k stars 120 forks source link

ValueError: kth(=-1) out of bounds (400000) #30

Closed ParikhKadam closed 6 years ago

ParikhKadam commented 6 years ago

Steps to reproduce:

from pymagnitude import *
glove = Magnitude("../../../Datasets/Magnitude/glove.6B.300d.magnitude")
print(glove.closer_than("cat", "tiger"))

Output:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-36-15ba9e6c70f4> in <module>()
----> 1 print(glove.closer_than("cat", "tiger")) # ["dog", ...]

C:\Python36\lib\site-packages\pymagnitude\third_party\repoze\lru\__init__.py in cached_wrapper(*args, **kwargs)
    352             else:
    353                 if val is marker:
--> 354                     val = func(*args, **kwargs)
    355                     cache.put(key, val)
    356                 return val

C:\Python36\lib\site-packages\pymagnitude\__init__.py in closer_than(self, key, q, topn)
   1232 
   1233         return self.most_similar(key, topn=topn, min_similarity=min_similarity,
-> 1234                                  return_similarities=False)
   1235 
   1236     def get_vectors_mmap(self):

C:\Python36\lib\site-packages\pymagnitude\third_party\repoze\lru\__init__.py in cached_wrapper(*args, **kwargs)
    352             else:
    353                 if val is marker:
--> 354                     val = func(*args, **kwargs)
    355                     cache.put(key, val)
    356                 return val

C:\Python36\lib\site-packages\pymagnitude\__init__.py in most_similar(self, positive, negative, topn, min_similarity, return_similarities)
   1163                 negative),
   1164             return_similarities=return_similarities,
-> 1165             method='distance')
   1166 
   1167     @lru_cache(DEFAULT_LRU_CACHE_SIZE, ignore_unhashable_args=True)

C:\Python36\lib\site-packages\pymagnitude\__init__.py in _db_query_similarity(self, positive, negative, min_similarity, topn, exclude_keys, return_similarities, method, effort)
   1068 
   1069                 partition_results = np.argpartition(similiarities, -1 * min(
-> 1070                     filter_topn, self.batch_size - 1))[-filter_topn:]
   1071 
   1072                 for index in partition_results:

C:\Python36\lib\site-packages\numpy\core\fromnumeric.py in argpartition(a, kth, axis, kind, order)
    755 
    756     """
--> 757     return _wrapfunc(a, 'argpartition', kth, axis=axis, kind=kind, order=order)
    758 
    759 

C:\Python36\lib\site-packages\numpy\core\fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
     49 def _wrapfunc(obj, method, *args, **kwds):
     50     try:
---> 51         return getattr(obj, method)(*args, **kwds)
     52 
     53     # An AttributeError occurs if the object does not have

ValueError: kth(=-1) out of bounds (400000)
AjayP13 commented 6 years ago

Confirmed, this is a bug. I just committed a fix. It is currently building and deploying on the CI/CD (this takes some time, a few hours). The fix will be out on v0.1.53. I'll comment back here when it is deployed.

Thanks for reporting.

ParikhKadam commented 6 years ago

No thanks to me brother.. Thanks to you for creating such a good package. It solved many of my problems.

Good job. Keep it up.

AjayP13 commented 6 years ago

The fix is out on 0.1.55. You can do pip install -U pymagnitude (Python 2.7) or pip3 install -U pymagnitude (Python 3) to upgrade.