今天使用了Similarity().similarity(sentence1,sentence2)计算两个词语或短句的相似度，提示我下载pytorch_model_bin等文件，之后使用后发现结果与我上次使用（一周前）结果差异特别大，结果较不准确(后来测试text2vec同样如此)，而在您的demo中计算相似度则结果正常，请问造成现在这样差异原因您能想到什么？

shibing624 / similarities

Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包，支持亿级数据文搜文、文搜图、图搜图，python3开发，开箱即用。

https://pypi.org/project/similarities/

Apache License 2.0

749 stars 72 forks source link

今天使用了Similarity().similarity(sentence1,sentence2)计算两个词语或短句的相似度，提示我下载pytorch_model_bin等文件，之后使用后发现结果与我上次使用（一周前）结果差异特别大，结果较不准确(后来测试text2vec同样如此)，而在您的demo中计算相似度则结果正常，请问造成现在这样差异原因您能想到什么？ #8

Closed sswhales closed 1 year ago

sswhales commented 1 year ago

Describe the Question

Please provide a clear and concise description of what the question is.

Describe your attempts

[ ] I walked through the tutorials
[ ] I checked the documentation
[ ] I checked to make sure that this is not a duplicate question

You may also provide a Minimal, Complete, and Verifiable example you tried as a workaround, or StackOverflow solution that you have walked through. (e.g. cosmic radiation).

shibing624 commented 1 year ago

默认模型改了 https://github.com/shibing624/similarities/commit/e666de1e638ff3a11bcd6da4aa5e1cdfd900c60f#diff-c0bb12f632dde502e1847f59fb6b42e7061be3f132210b75bd18fc89ab9acc12R80 ，这个你可以加参数改为之前的。

from similarities import Similarity
m = Similarity(model_name_or_path="shibing624/text2vec-base-chinese")

修改原因是为了兼容英文相似度计算结果；如果你只需要处理中文，则用shibing624/text2vec-base-chinese是更合适的。

sswhales commented 1 year ago

非常感谢回复，原因也正如您所说，问题已经解决。