pochih / RL-Chatbot

🤖 Deep Reinforcement Learning Chatbot
MIT License
418 stars 140 forks source link

Why the sigmoid in count_rewards() #9

Closed cindy21td closed 6 years ago

cindy21td commented 6 years ago

Hi,

In python\RL\train.py, after adding the ease of answering reward and semantic coherence, the sigmoid of the reward is scaled by 1.1

total_loss = sigmoid(total_loss) * 1.1

What was the purpose of the sigmoid and the scaling (1.1) in this line 261?

Also, I noticed you didn't weight each reward by lambda like in the "Deep Reinforcement Learning for Dialogue Generation" paper. Was this on purpose?

Thanks!

pochih commented 6 years ago

The sigmoid and the scaling is the empirical setting.

Paper didn't explain how to obtain the lambda value. Maybe it's obtain from experiments, and is highly depend on training data. So I ignore the lambda.