Investigate effects of learning rate, learning rate schedules

isaac091 commented 3 months ago

I've been observing that for models that take a large amount of steps to reach the early stopping criteria (~20k+ steps), increasing the learning rate significantly (5e-5 --> 2e-4) often cuts the number of steps needed in half, which in turn cuts the training time in half. For models that take less steps to begin with, an increased learning rate can also reduce the number of steps needed, but that is less often the case. The score metrics do not seem to be significantly affected by the learning rate.

To do:

Use some hyperparameter optimization tool (ClearML, Weights and Biases?) to see if there is a learning rate that consistently reduces the training time for Scripture projects using NLLB
Experiment with different learning rate schedules

ddaspit commented 3 months ago

Is this true of fully fine-tuned models or just LoRA models?

isaac091 commented 3 months ago

I've noticed it for both, but I've run a lot more experiments with LoRA/other model reduction methods than without, so I will need to get some more data points before I'm more confident about the types of scenarios that benefit from a higher learning rate. This issue is meant to be focusing on fully fine-tuned models, since the default learning rate for LoRA models has already been updated to be higher.

ddaspit commented 3 months ago

Sounds good. This could be an easy way to speed up training.

sillsdev / silnlp

Investigate effects of learning rate, learning rate schedules #426