voidful / TextRL

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
MIT License
545 stars 60 forks source link

Fix Reward Calculation in example/2022-12-10-textrl-elon-musk.ipynb #28

Closed Alanhsiu closed 6 months ago

Alanhsiu commented 6 months ago

Fix the issue #27