Closed sunprinceS closed 4 years ago
During fine-tuning, in general we don't need warmup at first, so which optimizer (maybe Adam, AdamW?) and which learning rate (same as inner-loop learn?) we should use?
Some facts on cross region setting
eval_wer
best_step
btw, sgd with 1e-3 lr result is poor, already delete the exps
During fine-tuning, in general we don't need warmup at first, so which optimizer (maybe Adam, AdamW?) and which learning rate (same as inner-loop learn?) we should use?