Inner-loop Optimizer - Githubissues

sunprinceS / MetaASR-CrossAccent

Meta-Learning for End-to-End ASR

MIT License

10 stars 1 forks source link

Inner-loop Optimizer #2

Closed sunprinceS closed 4 years ago

sunprinceS commented 4 years ago

As some paper mentions that transformer need so-called warmup_steps, during inner-loop, should we use same lr forever, or change with outer-loop lr? In addition, which optimizer should we choose?

sunprinceS commented 4 years ago

Current implementation: *SGD with lr sqrt(d_model) sqrt(warmup_steps in outer-loop)** (The max. lr in outer-loop learning schedule)

Reasons:

Since in general we only consider 1 GD step in inner-loop, I think SGD is enough.
We assume during fine-tuning, we don't need warmup again, so just use the max. value.