ing-bank / sparse_dot_topn

Python package to accelerate the sparse matrix multiplication and top-n similarity selection
Apache License 2.0
399 stars 86 forks source link

Arguements N and return_best_ntop looks like useless in my case. #77

Closed AmT42 closed 1 year ago

AmT42 commented 2 years ago

Hello,

Firsly thank you for this awesome work. There is something that I'm not sure to understand : t1,_1 = awesome_cossim_topn(X, Y.T, 1,0.5,return_best_ntop=True) t2,_2 = awesome_cossim_topn(X, Y.T, 10,0.5,return_best_ntop=True)

This two lines return me the same things. It looks like the value N doesn't matter and also the arguement return_best_ntop.

X and Y are as follow : vec1 = TfidfVectorizer(lowercase=False, analyzer="char", ngram_range=(2, 3)) vec = vec1.fit(vendor_names + spend_vendor_name)

X = vec1.transform(vendor_names) Y = vec1.transform([spend_vendor_name])

with vendor_name = List[str] of length 2 millions and spend_vendor_name = "goldman sach aim".

Thank you