Closed cindy21td closed 6 years ago
Hi,
In python\RL\train.py, after adding the ease of answering reward and semantic coherence, the sigmoid of the reward is scaled by 1.1
python\RL\train.py
total_loss = sigmoid(total_loss) * 1.1
What was the purpose of the sigmoid and the scaling (1.1) in this line 261?
Also, I noticed you didn't weight each reward by lambda like in the "Deep Reinforcement Learning for Dialogue Generation" paper. Was this on purpose?
Thanks!
The sigmoid and the scaling is the empirical setting.
Paper didn't explain how to obtain the lambda value. Maybe it's obtain from experiments, and is highly depend on training data. So I ignore the lambda.
Hi,
In
python\RL\train.py
, after adding the ease of answering reward and semantic coherence, the sigmoid of the reward is scaled by 1.1total_loss = sigmoid(total_loss) * 1.1
What was the purpose of the sigmoid and the scaling (1.1) in this line 261?
Also, I noticed you didn't weight each reward by lambda like in the "Deep Reinforcement Learning for Dialogue Generation" paper. Was this on purpose?
Thanks!