sunprinceS / MetaASR-CrossAccent

Meta-Learning for End-to-End ASR
MIT License
10 stars 1 forks source link

Inner-loop Optimizer #2

Closed sunprinceS closed 4 years ago

sunprinceS commented 4 years ago

As some paper mentions that transformer need so-called warmup_steps, during inner-loop, should we use same lr forever, or change with outer-loop lr? In addition, which optimizer should we choose?

sunprinceS commented 4 years ago

Current implementation: *SGD with lr sqrt(d_model) sqrt(warmup_steps in outer-loop)** (The max. lr in outer-loop learning schedule)

Reasons:

  1. Since in general we only consider 1 GD step in inner-loop, I think SGD is enough.
  2. We assume during fine-tuning, we don't need warmup again, so just use the max. value.