lucidrains / PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
MIT License
7.67k stars 668 forks source link

KL_div/ratio on policy #26

Closed kkissmart closed 1 year ago

kkissmart commented 1 year ago

nvm