sillsdev / silnlp

A set of pipelines for performing experiments on various NLP tasks with a focus on resource-poor/minority languages.
Other
31 stars 3 forks source link

Investigate effects of learning rate, learning rate schedules #426

Open isaac091 opened 3 months ago

isaac091 commented 3 months ago

I've been observing that for models that take a large amount of steps to reach the early stopping criteria (~20k+ steps), increasing the learning rate significantly (5e-5 --> 2e-4) often cuts the number of steps needed in half, which in turn cuts the training time in half. For models that take less steps to begin with, an increased learning rate can also reduce the number of steps needed, but that is less often the case. The score metrics do not seem to be significantly affected by the learning rate.

To do:

ddaspit commented 3 months ago

Is this true of fully fine-tuned models or just LoRA models?

isaac091 commented 3 months ago

I've noticed it for both, but I've run a lot more experiments with LoRA/other model reduction methods than without, so I will need to get some more data points before I'm more confident about the types of scenarios that benefit from a higher learning rate. This issue is meant to be focusing on fully fine-tuned models, since the default learning rate for LoRA models has already been updated to be higher.

ddaspit commented 3 months ago

Sounds good. This could be an easy way to speed up training.