KL_div/ratio on policy - Githubissues

lucidrains / PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

MIT License

7.71k stars 669 forks source link

Closed kkissmart closed 1 year ago

kkissmart commented 1 year ago

nvm