When I use the schedule_type flag in the fit() function along with warmup_steps, there is never a warmup period - see below.
warmup_steps=2
lr=6e-05
schedule_type="warmup_cosine"
lr after epoch 1: 5.963082662464444e-05
lr after epoch 2: 5.8532025639284296e-05
lr after epoch 3: 5.673065382761236e-05
The learning rate is decreasing, but it starts decreasing immediately and there is no ramp-up that you would expect in a warmup period. This happens no matter how many warmup_steps I use - the learning rate starts where I set it in the fit() function at the first epoch, and decreases.
I'm not sure exactly how to fix this, or how to figure out whether this is an issue with the fast-bert implementation (does it have something to do with how the Learner is initialized?) or with Transformers itself (I don't see any issues with this problem on their github page). Any insights would be appreciated.
@kaushaltrivedi
When I use the
schedule_type
flag in thefit()
function along withwarmup_steps
, there is never a warmup period - see below.The learning rate is decreasing, but it starts decreasing immediately and there is no ramp-up that you would expect in a warmup period. This happens no matter how many
warmup_steps
I use - the learning rate starts where I set it in thefit()
function at the first epoch, and decreases.I'm not sure exactly how to fix this, or how to figure out whether this is an issue with the
fast-bert
implementation (does it have something to do with how the Learner is initialized?) or with Transformers itself (I don't see any issues with this problem on their github page). Any insights would be appreciated. @kaushaltrivedi