utterworks / fast-bert

Super easy library for BERT based NLP models
Apache License 2.0
1.85k stars 342 forks source link

Warmup not working #230

Open lingdoc opened 4 years ago

lingdoc commented 4 years ago

When I use the schedule_type flag in the fit() function along with warmup_steps, there is never a warmup period - see below.

warmup_steps=2
lr=6e-05
schedule_type="warmup_cosine"

lr after epoch 1: 5.963082662464444e-05
lr after epoch 2: 5.8532025639284296e-05
lr after epoch 3: 5.673065382761236e-05

The learning rate is decreasing, but it starts decreasing immediately and there is no ramp-up that you would expect in a warmup period. This happens no matter how many warmup_steps I use - the learning rate starts where I set it in the fit() function at the first epoch, and decreases.

I'm not sure exactly how to fix this, or how to figure out whether this is an issue with the fast-bert implementation (does it have something to do with how the Learner is initialized?) or with Transformers itself (I don't see any issues with this problem on their github page). Any insights would be appreciated. @kaushaltrivedi