Closed sacharbit closed 6 months ago
I don't see the dropout implemented in your script. If it's there, where is it? Especially during training.
No dropout is widely used to train modern LMs (such as T5, GPT series, and Llama) and we adopt this setting.
I don't see the dropout implemented in your script. If it's there, where is it? Especially during training.