Open luomuqinghan opened 6 years ago
Thanks for your sharing.
In RL for ease of answering, the reward is calculated by RL model itself, not another model?
Why not input the action into another pretrained model to obtain the response, and measure its likelihood with a dull response?
Thanks for your sharing.
In RL for ease of answering, the reward is calculated by RL model itself, not another model?
Why not input the action into another pretrained model to obtain the response, and measure its likelihood with a dull response?