Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
7.7k
stars
668
forks
source link
Easier (and faster) chunk and inplace under nograd #1
Closed
hypnopump closed 1 year ago
Minor tricks. Proof