lpworld / DASL

12 stars 2 forks source link

How do you binarize the rating information as labels of click and non-click for the click-through-rate prediction task. #1

Closed rogeroyer closed 2 years ago

rogeroyer commented 2 years ago

Hello, I can't find the code of label switching in your code, could you tell me how do you do this on the Amazon dataset? Thank you.

lpworld commented 2 years ago

Hello, thanks for your interest. We binarize the labels using a threshold of 3.5 (in a rating scale of 1-5). We have preprocessed the dataset so we do not include the code of label switching here. Sorry for the confusion.

rogeroyer commented 2 years ago

Thanks for your reply. Would you please share the preprocessing code of the Amazon data set? I have some problems with the reproduced results. Looking forward to your reply.

lpworld commented 2 years ago

Thanks again. I cannot find my preprocessing code at this point but I have uploaded a filtered version of the Amazon dataset that I use in the experiment. Hopefully, that could be something useful for you. You are also welcomed to try out some of your own datasets.

rogeroyer commented 2 years ago

OK, Thanks for your help.

liangliang-max commented 2 years ago

@lpworld 您好,请问您找到您的预处理代码了吗,您在实验中提供了2个数据集,再将您提供的数据集进行替换实验中的Amazon数据集时,有点问题,直接替换就可以运行吗,可以回复一下吗,谢谢,麻烦您了。

liangliang-max commented 2 years ago

@rogeroyer 您好,您成功复现这篇文章的代码了吗

liangliang-max commented 2 years ago

@lpworld @rogeroyer 再阅读论文时发现,再进行编码与嵌入表示时,论文中提到用不同的方式进行编码,可以解释一下是如何编码的吗,谢谢

lpworld commented 2 years ago

@liangliang-max 你好,可以直接用我提供的2个数据集运行我们提供的代码。预处理过程中我们主要是移除了在数据集中出现次数少于10次的用户和商品。在离线实验中我们是统一使用用户和商品id进行embedding lookup,论文中并没有提到其他不同的编码方式。在线上实验中的编码方式是不同的,因为我们使用了阿里巴巴的嵌入模型。

liangliang-max commented 2 years ago

感谢您的回复,还想请问一下,你在User embedding这部分是如何进行的,以及共享空间的双向映射是如何处理的,Dual Attention是2个点积注意力吗,再看您的论文中没看明白,辛苦了,谢谢您

------------------ 原始邮件 ------------------ 发件人: "lpworld/DASL" @.>; 发送时间: 2022年2月27日(星期天) 晚上11:04 @.>; @.**@.>; 主题: Re: [lpworld/DASL] How do you binarize the rating information as labels of click and non-click for the click-through-rate prediction task. (#1)

@liangliang-max 你好,可以直接用我提供的2个数据集运行我们提供的代码。预处理过程中我们主要是移除了在数据集中出现次数少于10次的用户和商品。在离线实验中我们是统一使用用户和商品id进行embedding lookup,论文中并没有提到其他不同的编码方式。在线上实验中的编码方式是不同的,因为我们使用了阿里巴巴的嵌入模型。

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>

liangliang-max commented 2 years ago

您好,再换成您提供的2个数据集之后,错了KeyError: "None of [Index(['utdid'], dtype='object')] are in the [columns]",找不到utdid,请文是什么原因呢,数据集里也没有,应该如何更改呢

------------------ 原始邮件 ------------------ 发件人: "lpworld/DASL" @.>; 发送时间: 2022年2月27日(星期天) 晚上11:04 @.>; @.**@.>; 主题: Re: [lpworld/DASL] How do you binarize the rating information as labels of click and non-click for the click-through-rate prediction task. (#1)

@liangliang-max 你好,可以直接用我提供的2个数据集运行我们提供的代码。预处理过程中我们主要是移除了在数据集中出现次数少于10次的用户和商品。在离线实验中我们是统一使用用户和商品id进行embedding lookup,论文中并没有提到其他不同的编码方式。在线上实验中的编码方式是不同的,因为我们使用了阿里巴巴的嵌入模型。

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>

lpworld commented 2 years ago

@liangliang-max 你好,我已经修改了train.py文件,现在代码应该可以正常运行了。User Embedding部分是通过user id先做embedding lookup初始化 然后通过Dual Embedding进行不同商品域之间的metric learning映射进行更新。Dual Attention是把不同域内的用户行为粘贴起来做一个统一的点积注意力。

liangliang-max commented 2 years ago

好的,谢谢您,辛苦了,我再试试

------------------ 原始邮件 ------------------ 发件人: "lpworld/DASL" @.>; 发送时间: 2022年2月28日(星期一) 晚上10:32 @.>; @.**@.>; 主题: Re: [lpworld/DASL] How do you binarize the rating information as labels of click and non-click for the click-through-rate prediction task. (#1)

@liangliang-max 你好,我已经修改了train.py文件,现在代码应该可以正常运行了。User Embedding部分是通过user id先做embedding lookup初始化 然后通过Dual Embedding进行不同商品域之间的metric learning映射进行更新。Dual Attention是把不同域内的用户行为粘贴起来做一个统一的点积注意力。

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>

liangliang-max commented 2 years ago

谢谢您再百忙之中,能够细心的为我讲解,并修改代码,我想以您的代码为base-line,再此基础上做一些创新,您有什么好的建议吗。还有就是再运行您修改之后的代码又碰到一些问题,是我这边服务器的问题吗,希望您能够再次讲解,谢谢您

------------------ 原始邮件 ------------------ 发件人: "lpworld/DASL" @.>; 发送时间: 2022年2月28日(星期一) 晚上10:32 @.>; @.**@.>; 主题: Re: [lpworld/DASL] How do you binarize the rating information as labels of click and non-click for the click-through-rate prediction task. (#1)

@liangliang-max 你好,我已经修改了train.py文件,现在代码应该可以正常运行了。User Embedding部分是通过user id先做embedding lookup初始化 然后通过Dual Embedding进行不同商品域之间的metric learning映射进行更新。Dual Attention是把不同域内的用户行为粘贴起来做一个统一的点积注意力。

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>

lpworld commented 2 years ago

@liangliang-max 你好,我刚刚测试了一下代码没有发现问题 ,请确认一下服务器相关的配置(Tensorflow1.4.0+CUDA8.0)。如果你对相关工作感兴趣,你也可以参考我们其他两篇同样用对偶学习做跨域推荐的文章,相关代码也进行了开源。 [1] Li, P. and Tuzhilin, A., 2020, January. Ddtcdr: Deep dual transfer cross domain recommendation. In Proceedings of the 13th International Conference on Web Search and Data Mining (pp. 331-339). [2] Li, P. and Tuzhilin, A., 2021. Dual metric learning for effective and efficient cross-domain recommendations. IEEE Transactions on Knowledge and Data Engineering.

liangliang-max commented 2 years ago

好的,谢谢您

------------------ 原始邮件 ------------------ 发件人: "lpworld/DASL" @.>; 发送时间: 2022年3月1日(星期二) 晚上10:38 @.>; @.**@.>; 主题: Re: [lpworld/DASL] How do you binarize the rating information as labels of click and non-click for the click-through-rate prediction task. (#1)

@liangliang-max 你好,我刚刚测试了一下代码没有发现问题 ,请确认一下服务器相关的配置(Tensorflow1.4.0+CUDA8.0)。如果你对相关工作感兴趣,你也可以参考我们其他两篇同样用对偶学习做跨域推荐的文章,相关代码也进行了开源。 [1] Li, P. and Tuzhilin, A., 2020, January. Ddtcdr: Deep dual transfer cross domain recommendation. In Proceedings of the 13th International Conference on Web Search and Data Mining (pp. 331-339). [2] Li, P. and Tuzhilin, A., 2021. Dual metric learning for effective and efficient cross-domain recommendations. IEEE Transactions on Knowledge and Data Engineering.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>

liangliang-max commented 2 years ago

您好,非常感谢您之前提供的帮助,通过复现您提供的代码发现,测试集的AUC值最高在53%左右,训练集的AUC值在80%多。您可以提供其余的您处理好的数据集吗,或者是预处理的文件吗,非常感谢您。

liangliang-max commented 2 years ago

您好,很抱歉再次打扰您,您可以提供一下其余2个数据集吗,或者是您对数据进行处理的代码吗,谢谢您。 我在您提供github上看到了原数据集的链接,是用的仅评分的数据吗,期待您的指导,再次谢谢您

------------------ 原始邮件 ------------------ 发件人: "魏金亮" @.>; 发送时间: 2022年3月2日(星期三) 上午8:55 @.>;

主题: 回复: [lpworld/DASL] How do you binarize the rating information as labels of click and non-click for the click-through-rate prediction task. (#1)

好的,谢谢您

------------------ 原始邮件 ------------------ 发件人: "lpworld/DASL" @.>; 发送时间: 2022年3月1日(星期二) 晚上10:38 @.>; @.**@.>; 主题: Re: [lpworld/DASL] How do you binarize the rating information as labels of click and non-click for the click-through-rate prediction task. (#1)

@liangliang-max 你好,我刚刚测试了一下代码没有发现问题 ,请确认一下服务器相关的配置(Tensorflow1.4.0+CUDA8.0)。如果你对相关工作感兴趣,你也可以参考我们其他两篇同样用对偶学习做跨域推荐的文章,相关代码也进行了开源。 [1] Li, P. and Tuzhilin, A., 2020, January. Ddtcdr: Deep dual transfer cross domain recommendation. In Proceedings of the 13th International Conference on Web Search and Data Mining (pp. 331-339). [2] Li, P. and Tuzhilin, A., 2021. Dual metric learning for effective and efficient cross-domain recommendations. IEEE Transactions on Knowledge and Data Engineering.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>

lpworld commented 2 years ago

@liangliang-max 你好,Alibaba-Youku数据集是直接从产品端采样的,所以没法提供。如果需要的话,我可以提供一份Imhonet数据集的脱敏采样版本给你。你可以留一个邮箱或者发邮件直接和我联系。

liangliang-max commented 2 years ago

@.***

------------------ 原始邮件 ------------------ 发件人: "lpworld/DASL" @.>; 发送时间: 2022年4月7日(星期四) 晚上10:39 @.>; @.**@.>; 主题: Re: [lpworld/DASL] How do you binarize the rating information as labels of click and non-click for the click-through-rate prediction task. (#1)

@liangliang-max 你好,Alibaba-Youku数据集是直接从产品端采样的,所以没法提供。如果需要的话,我可以提供一份Imhonet数据集的脱敏采样版本给你。你可以留一个邮箱或者发邮件直接和我联系。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

lpworld commented 2 years ago

@liangliang-max 你好,这个邮箱没有显示出来

liangliang-max commented 2 years ago

您好,这是我微信的二维码,您要是方便的话,可以加下微信吗,下面也留下了qq邮箱,谢谢您了,这么晚打扰您了

下面的是我的qq邮箱:

------------------ 原始邮件 ------------------ 发件人: "lpworld/DASL" @.>; 发送时间: 2022年4月7日(星期四) 晚上11:32 @.>; @.**@.>; 主题: Re: [lpworld/DASL] How do you binarize the rating information as labels of click and non-click for the click-through-rate prediction task. (#1)

@liangliang-max 你好,这个邮箱没有显示出来

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

lpworld commented 2 years ago

@liangliang-max 你好,图和邮箱都没有显示出来。。你可以直接给我发邮件pl1748@nyu.edu

liangliang-max commented 2 years ago

您好,已回复您,不知您是否收到消息

------------------ 原始邮件 ------------------ 发件人: "lpworld/DASL" @.>; 发送时间: 2022年4月7日(星期四) 晚上11:40 @.>; @.**@.>; 主题: Re: [lpworld/DASL] How do you binarize the rating information as labels of click and non-click for the click-through-rate prediction task. (#1)

@liangliang-max @.***

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>