Closed conceptofmind closed 1 year ago
@lucidrains What would you like the first model to be named?
One restart. 160B tokens.
Did some tests with qk_norm vs no qk_norm as well. When using AdamW decided to go with qk_norm=False. I will explore this with Lion after.
qk_norm=False
PaLM 1B
@lucidrains What would you like the first model to be named?