数据处理问题 - Githubissues

weiyinwei / MMGCN

MMGCN: Multi-modal Graph Convolution Network forPersonalized Recommendation of Micro-video

283 stars 52 forks source link

数据处理问题 #38

Closed Cleanspeech317 closed 3 years ago

Cleanspeech317 commented 3 years ago

请问您在筛选tiktok训练集的时候有用到‘finish’和‘like’两个字段么，还有我看您提供的样例数据里用户id和项目id可以是重复一样的（都是从0开始），这样是可以的吗？在最后提取图节点嵌入表示结果result_embedding的时候不会分不清项目表示和用户表示么？感谢您的回答！

weiyinwei commented 3 years ago

我只考虑了interaction，无论是否finish或like，都被认为是interacted。可以分别创建embedding matrix，也可以统一编码，例如读入item id后，给他加上user的个数。

Cleanspeech317 commented 3 years ago

只是单纯的按照时间戳进行筛选吗？这样不会造成在测试集负采样的过程中该用户在时间戳外交互了的项目在测试集中被采样成负样本么造成误差么？

Cleanspeech317 commented 3 years ago

还有我看论文里提到了视频长度3-15秒，对视频的长度也进行了筛选么？

kbk12 commented 3 years ago

问一下，跑模型报这个错误RuntimeError: The size of tensor a (38307) must match the size of tensor b (112741) at non-singleton dimension 0，该怎么解决呀，尝试了好多种方法，都没解决

weiyinwei commented 3 years ago

应该是num_item 和 num_user的问题，toy dataset的规模和完整不一样，需要根据大小手动设置一下。

kbk12 commented 3 years ago

网上提高的sample，user ，item 大小在哪可以找一下，我第一次跑模型，麻烦了

------------------ 原始邮件 ------------------ 发件人: "weiyinwei/MMGCN" @.>; 发送时间: 2021年10月27日(星期三) 上午10:40 @.>; @.**@.>; 主题: Re: [weiyinwei/MMGCN] 数据处理问题 (Issue #38)

应该是num_item 和 num_user的问题，toy dataset的规模和完整不一样，需要根据大小手动设置一下。

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

weiyinwei commented 3 years ago

？我有点没懂您的意思。但是item user的大小可以从toy dataset中读取出来，基本的操作就可以。

kbk12 commented 3 years ago

嗯嗯，感谢

------------------ 原始邮件 ------------------ 发件人: "weiyinwei/MMGCN" @.>; 发送时间: 2021年10月27日(星期三) 中午11:15 @.>; @.**@.>; 主题: Re: [weiyinwei/MMGCN] 数据处理问题 (Issue #38)

？我有点没懂您的意思。但是item user的大小可以从toy dataset中读取出来，基本的操作就可以。

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.