Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
MIT License
539
stars
60
forks
source link
Fix Reward Calculation in `example/2022-12-10-textrl-elon-musk.ipynb` #27
In the notebook example/2022-12-10-textrl-elon-musk.ipynb, the reward calculation in the MyRLEnv class should be updated for correct scoring. Specifically, the function get_reward needs modification.
The current code concatenates input_item[0] with the predicted text to calculate the sentiment score. However, input_item should be referenced differently to ensure proper reward calculation.
In the notebook
example/2022-12-10-textrl-elon-musk.ipynb
, the reward calculation in theMyRLEnv
class should be updated for correct scoring. Specifically, the functionget_reward
needs modification.Current Code:
The current code concatenates input_item[0] with the predicted text to calculate the sentiment score. However, input_item should be referenced differently to ensure proper reward calculation.