Closed joytianya closed 1 year ago
May I ask about the configs of pre-training? For example, did you use dropout?
We use the exact same hyperparameters as described in the original llama paper.
I didn't find dropout in the paper. May I ask if llama has used dropout?
May I ask about the configs of pre-training? For example, did you use dropout?