关于globo数据集上的训练部分代码

summmeer / session-based-news-recommendation

source code of paper "Positive, Negative and Neutral: Modeling Implicit Feedback in Session-based News Recommendation", which is accepted at SIGIR 2022.

31 stars 8 forks source link

关于globo数据集上的训练部分代码 #7

Open pss-bppp opened 1 year ago

pss-bppp commented 1 year ago

您好，打扰下占用您一点时间。我使用您提供的代码尝试进行复现工作。目前已经完成了globo数据集的数据处理部分，但是在模型训练阶段总会报错。似乎提供的训练部分代码本身有一些问题，请问是否可以提供一下globo数据集的训练代码（主要是main和model_combine等函数）？这是我报的错： tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[20,0] = 11 is not in [0, 11) [[Node: embedding_lookup = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:@Adam/update_duration_embedding/ApplyAdam"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](duration_embedding/read, _arg_active_time_0_0, embedding_lookup/axis)]] 应该是输入数据的问题。或者您可否提供一版单独在globo数据集上可以运行的代码？非常感谢。

summmeer commented 1 year ago

You may need to comment this line gap.append(bucketized(t['active_t'])) # if use Adressa dataset

and uncomment these lines:

https://github.com/summmeer/session-based-news-recommendation/blob/fd795a4a05588f641e5e8ad74ff80b477eead8ab/sampler.py#L89-L94

pss-bppp commented 1 year ago

非常感谢您的回复。但是您提到的这个问题我之前注意到了，而且已经改过了。但是还是不行。会报我之前报告的那个错误。

pss-bppp commented 1 year ago

您提供的源代码里，model_combine.py文件里面第115行好像有语法错误。我是去掉了interval=None,将其改成了： attout_item_cont, alph = multi_attention_layer(seq_item_cont, seq_content, interval=seq_active_time, click_time=click_t, edim1=self.hidden_size*2, edim2=self.hidden_size, edim3=self.time_hidden_size, scope="multi_attention", hidden_size=self.hidden_size, stddev=self.stddev)

是不是这里有问题? 抱歉，我平时使用pytorch较多，对于tf不是很熟悉。

summmeer commented 1 year ago

Yes, interval=None should be removed, this is for ablation.

pss-bppp commented 1 year ago

所以您认为问题出在哪里？请问您使用的python版本和tf版本是多少呢？

summmeer commented 1 year ago

function bucketized takes input seconds into 12 categories: [0, 11].

https://github.com/summmeer/session-based-news-recommendation/blob/fd795a4a05588f641e5e8ad74ff80b477eead8ab/sampler.py#L18

So the embedding dim should be 12, vocab_size=12

https://github.com/summmeer/session-based-news-recommendation/blob/fd795a4a05588f641e5e8ad74ff80b477eead8ab/model_combine.py#L106

Or you can change bucketized function, it is the same, as long as the index matches.

pss-bppp commented 1 year ago

非常感谢！目前模型已经开始训练了。再次感谢。但是训练起来以后，模型的损失函数值是NaN。Epoch 0: NaN error! 我修改了batch_size和学习率，依然有这个问题。并且我在计算损失函数那里尝试了neg_feedback = 0。结果依然是NaN error。

同时我发现会输出：size of seq_content (?, ?, 250) size of seq_publish_t (?, ?, 320) size of click_t (?, 128)

size of attout_item_cont (?, 500) size of attout_publish_t (?, 320) size of softmax_input (?, 10316) 请问这样正常吗？

summmeer commented 1 year ago

The log is normal, the question mark stands for batch size, which is unknown when building the graph for TensorFlow. I didn't encounter NaN before, maybe you can print the batch input to debug.

pss-bppp commented 1 year ago

我再试一试，在debug过程中我发现无论batch_size设为多大，第一个batch计算出的损失值（代码中crt_loss的值）都是正常的。第二个batch计算出的值就会有nan，第三个batch计算出的值就全部是nan。另外训练过程中GPU占有率为0，GPU没有工作。

pss-bppp commented 1 year ago

太好了，目前在globo数据集上已经可以正常训练了。暂时不用麻烦您了。再次感谢！感谢您负责任的科研态度。为您的工作点赞。

pss-bppp commented 1 year ago

您好，再次打扰，请问可否提供下adressa数据上的新闻标题和新闻标题embedding？非常感谢。即adressa/articles_titles.pkl和adressa/articles_embeddings.pkl？

summmeer commented 1 year ago

articles_embeddings+titles.zip

pss-bppp commented 1 year ago

Thank you very much。谢谢您。