zhanghainan / ReCoSa

ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation
58 stars 9 forks source link

Question about the Transformer Decoder #7

Closed katherinelyx closed 4 years ago

katherinelyx commented 4 years ago

Hi. When I employ the Transformer Encoder-Decoder framework on dialogue generation task, there are two serious issues on the generated results. First, the generated results are always begin with word 'i'; second, the repetition badly weakens the performance.

Target: hmm i guess on the 28th Predict: i i am i the bus of and and and and and and are and and and and and and and and and and and

The above is a typical bad example. Did you face such issue? Could you provide some suggestions to handle this problem? Thank you.

zhanghainan commented 4 years ago

You could run your model for more time. In early stage, it always output the high frequency word. Then if your training dataset has many target begin with the "I", it's actually easy to output the begining word "I".

katherinelyx commented 4 years ago

I tried to run the model on DailyDialog dataset for 100 epochs. But the results are always beginning with "I".

zhanghainan commented 4 years ago

You can use a very small dataset and see the output of the training dataset. If the model can't output the normal sentence, that maybe you didn't run the model right. you can print the input word id before the tensorflow function "tf.run()".

katherinelyx commented 4 years ago

OK. Thank you. I will try it.

zhanghainan commented 4 years ago

100 epoch is not enough for recosa. For my code, you should run more times, such as 20,000 epochs.

katherinelyx commented 4 years ago

20000 epochs ? what about the overfitting? With the early stop operations, how many epochs did ReCoSa need to obtain the results on Ubuntu dialogs reported in the paper? If with the dailydialog dataset, which contains only 13118 dialog sessions, how many epochs will you recommend ? Thank you.

发自我的iPhone

在 2019年12月18日,11:29,zhanghainan notifications@github.com 写道:

 100 epoch is not enough for recosa. For my code, you should run more times, such as 20,000 epochs.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

zhanghainan commented 4 years ago

In Recosa Ubuntu dataset, it need 20,000 epochs. For dailydialog, it at least need 10,000 epoch to see the quality of generation. It is hard for dialogue generation model to get the "overfiting" problems, because the dev measure is not enough to evaluate the quality of dialogue generation. Therefore, we should see the generation rather than see the dev measures.

katherinelyx commented 4 years ago

Thank you for your suggestions. During my experiments, there are many repetition results in generated sentences, like “and and and and “. It is like the seq2seq framework. Have you seen such issue? Or it is just an immediate stage during training?

发自我的iPhone

在 2019年12月18日,11:41,zhanghainan notifications@github.com 写道:

 In Recosa Ubuntu dataset, it need 20,000 epochs. For dailydialog, it at least need 10,000 epoch to see the quality of generation. It is hard for dialogue generation model to get the "overfiting" problems, because the dev measure is not enough to evaluate the quality of dialogue generation. Therefore, we should see the generation rather than see the dev measures.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

zhanghainan commented 4 years ago

An immediate stage during training. you could run model more times.

katherinelyx commented 4 years ago

I rethink your suggestions but still have some concerns. 20000 epochs is a really big number for training. I wonder how much time will it cost to get an acceptable result? Maybe take DailyDialog as example, which is not a big dataset.

zhanghainan commented 4 years ago

For the ubuntu dialogue, it takes about 5 days to see the results.