shenweichen / DeepMatch

A deep matching model library for recommendations & advertising. It's easy to train models and to export representation vectors which can be used for ANN search.
https://deepmatch.readthedocs.io/en/latest/
Apache License 2.0
2.19k stars 525 forks source link

in batch 负采样 #73

Closed yexiao1107 closed 2 years ago

yexiao1107 commented 2 years ago

Please refer to the FAQ in doc and search for the related issues before you ask the question.

Describe the question(问题描述) A clear and concise description of what the question is. 大佬,不知道dssm对于in batch 负采样如何实现,我自己在训练数据时候发现如果随机负采样,user embedding没有区分度,但是item embedding是有区分度的,查阅资料有同学说可能是将item的冷热做了区分,并没有学习到user的兴趣,请问对于双塔模型这块 有什么调优的经验吗 Additional context Add any other context about the problem here.

Operating environment(运行环境):

johndkl commented 2 years ago

https://zhuanlan.zhihu.com/p/358544636 实测有效

guixianjin commented 2 years ago

in-batach 负采样 还可以参考:https://zhuanlan.zhihu.com/p/336509201 或者 https://zhuanlan.zhihu.com/p/518561076

shenweichen commented 2 years ago

deepmatch v0.3.0版本开始已经支持了inbatch负采样,使用方法可参考 https://github.com/shenweichen/DeepMatch/blob/master/examples/colab_MovieLen1M_DSSM_InBatchSoftmax.ipynb