Closed williamSYSU closed 5 years ago
Main issues:
Right after pre-training, the nll_gen is also very high (~0.65 as you showed). So I think it’s because by setting gpre_lr=0.005 and npre_epochs=150, the pre-training may not be sufficient. A quick suggestion is to increase the npre_epochs (and/or increase the inverse temperature) to see if you can get good results.
It would be as expected since there is a tradeoff between sample quality and diversity, tuned by the maximum inverse temperature. In the extreme case where temperatue=1, which means no temperature control at all, the model will suffer from severe mode collapse.
For the gumbel-softmax trick, the temperature control plays a crucial role in the overall performance. So yes, it is mainly because “very sensitive to temperature”.
The gumbel-softmax trick.
I would recommend to improve the gumbel-softmax trick. In this work, we just use the vanilla version of the gumbel-softmax with some temperature control. I believe there is still large room for improving this module. For example, REBAR would be the first thing to try.
Yes, temperature>1 is essential for RelGAN from the temperature control perspective. SeqGAN and LeakGAN do not reply on temperature control as they apply REINFORCE, so they are less sensitive to the temperature.
I am very grateful that you can take your time to answer my questions in detail and patiently. Your answer does help me have a better understanding of your RelGAN, while I am also confused about another thing.
According to my understanding of your code, the calculation process of g_pretrain_loss
and nll_gen
is exactly the same, except for the parameters of the “generator”. In fact, nll_gen
is the g_pretrain_loss
calculated by the "generator" whose parameters have been updated after pre-training. Therefore, the value of g_pretrain_loss
and nll_gen
should be close, and the value of g_pretrain_loss
should be generally larger than the value of nll_gen
from the training perspective. However, from the log file with gpre_lr=0.005
on Synthetic Data, the g_pretrain_loss
is already small (~1.7) while the nll_gen
is still large (~6.3). According the above analysis, this situation should not happen.
Is there mistake of my understanding or my analysis? Or is the way you calculate the g_pretrain_loss
and the nll_gen
different?
I think the difference between g_pretrain_loss
and nll_gen
mainly lies in how each of them is calculated over mini-batches: For nll_gen
, we fix the generator parameters and take average of g_loss
over all mini-batches (please refer to nll_loss()
in Nll.py
). For g_pretrain_loss
, however, we always first adapt the value of g_loss
to each mini-batch and then take average of the adapted g_loss
s over all mini-batches (please refer to pre_train_epoch()
in utils.py
). It explains why g_pretrain_loss
is lower than nll_gen
.
Thank you again for your answers and code :)
Hi,
First of all, thanks for sharing your code! I’m impressive of your solid work. However, I found some issues when I run your code under different hyper-parameters.
Main issues:
gpre_lr=0.005
, while your model behaves normal undergpre_lr=0.01
.temperature=1
.Here’s my system environment.
Here are the problems I encountered when running your code.
For
Synthetic data
experiment, I simply change thegpre_lr
from0.01
to0.005
. After 1620 epoch adversarial training, the model only generate one repeated sentence. While it behaves normal undergpre_lr=0.01
.experiment-log-relgan.csv
1620th
adversarial epoch’s samples fromadv_samples_01620.txt
. (Only generated repeated sentences)For Image COCO caption data, I simply change the
temperature
from100
to1
. The problem of generating repeated sentences arises again. Also, the model generates diverse sentences undertemperature=100
.experiment-log-relgan.csv
. For saving time, I didn’t calculate bleu-3 score.1000th
adversarial epoch’s samples fromadv_samples_01000.txt
. (Only generated repeated sentences)