lucidrains / PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
MIT License
7.67k stars 668 forks source link

Model Name #49

Closed conceptofmind closed 1 year ago

conceptofmind commented 1 year ago

@lucidrains What would you like the first model to be named?

conceptofmind commented 1 year ago

Screenshot from 2023-05-02 15-46-13 One restart. 160B tokens.

conceptofmind commented 1 year ago

Screenshot from 2023-05-03 00-04-41 Did some tests with qk_norm vs no qk_norm as well. When using AdamW decided to go with qk_norm=False. I will explore this with Lion after.

conceptofmind commented 1 year ago

Screenshot from 2023-05-05 21-09-30 PaLM 1B