jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Apache License 2.0
7.31k stars 426 forks source link

No dropout? #119

Closed sacharbit closed 6 months ago

sacharbit commented 6 months ago

I don't see the dropout implemented in your script. If it's there, where is it? Especially during training.

jzhang38 commented 6 months ago

No dropout is widely used to train modern LMs (such as T5, GPT series, and Llama) and we adopt this setting.