shibing624 / similarities

Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包,支持亿级数据文搜文、文搜图、图搜图,python3开发,开箱即用。
https://pypi.org/project/similarities/
Apache License 2.0
749 stars 72 forks source link

今天使用了Similarity().similarity(sentence1,sentence2)计算两个词语或短句的相似度,提示我下载pytorch_model_bin等文件,之后使用后发现结果与我上次使用(一周前)结果差异特别大,结果较不准确(后来测试text2vec同样如此),而在您的demo中计算相似度则结果正常,请问造成现在这样差异原因您能想到什么? #8

Closed sswhales closed 1 year ago

sswhales commented 1 year ago

Describe the Question

Please provide a clear and concise description of what the question is.

Describe your attempts

You may also provide a Minimal, Complete, and Verifiable example you tried as a workaround, or StackOverflow solution that you have walked through. (e.g. cosmic radiation).

shibing624 commented 1 year ago

默认模型改了 https://github.com/shibing624/similarities/commit/e666de1e638ff3a11bcd6da4aa5e1cdfd900c60f#diff-c0bb12f632dde502e1847f59fb6b42e7061be3f132210b75bd18fc89ab9acc12R80 ,这个你可以加参数改为之前的。

from similarities import Similarity
m = Similarity(model_name_or_path="shibing624/text2vec-base-chinese")

修改原因是为了兼容英文相似度计算结果;如果你只需要处理中文,则用shibing624/text2vec-base-chinese是更合适的。

sswhales commented 1 year ago

非常感谢回复,原因也正如您所说,问题已经解决。