HanLP语义相似度，希望可以输出句子的embedding以便做存储，提高效率

yuxulingche commented 1 year ago

Describe the feature and the current behavior/state. 当前使用sts，输入两个句子，对于大量句子比较，效率太低，虽然可以batch来做，但效率还是不够

Will this change the current api? How? 可以在sts里增加一个输出

Who will benefit with this feature? sts使用者

Are you willing to contribute it (Yes/No): No

System information

Any other info HanLP语义相似度比较的效果不错，非常感谢作者的贡献，但现在有大量句子需要比较，希望HanLP能增加输出句子embedding的功能，先存储，使用时算cos距离，提高实际使用中的比较效率

hankcs commented 1 year ago

Hi, 目前的STS模型需要同时输入一对句子计算相似度，不支持输出embedding。我们正在研发用于检索的句子embedding，敬请关注后续更新。

yfq512 commented 1 year ago

同样期待高效率的方法，目前可以使用simhash和bert的方法，但simhash准确率一般，bert计算量又大

zhangyifei1 commented 2 weeks ago

请问现在支持了吗

shenwenxin commented 1 week ago

Hi, 目前的STS模型需要同时输入一对句子计算相似度，不支持输出embedding。我们正在研发用于检索的句子embedding，敬请关注后续更新。

@hankcs 你好请问现在支持了吗？

hankcs / HanLP