Previously, weighting of confidence loss would jump from step_frac just before step_frac > warmup_frac to 1.0 when step_frac > warmup_frac.
For example, if warmup_frac = 0.1, then when step_frac = 0.1, coef = step_frac = 0.1, and then at the next step step_frac > 0.1 = warmup_frac, so coef = 1.0.
The paper describes this weighting as increasing smoothly. The change makes it increase linearly from 0 to 1 during the first warmup_frac steps.
We change > to >= to allow warmup_frac = 0.0, then on the first step step_frac = 0.0 and coef = 1.0, otherwise we'd get a divide by 0 error on the first step.
Previously, weighting of confidence loss would jump from
step_frac
just beforestep_frac > warmup_frac
to1.0
whenstep_frac > warmup_frac
.For example, if
warmup_frac = 0.1
, then whenstep_frac = 0.1
,coef = step_frac = 0.1
, and then at the next stepstep_frac > 0.1 = warmup_frac
, socoef = 1.0
.The paper describes this weighting as increasing smoothly. The change makes it increase linearly from 0 to 1 during the first
warmup_frac
steps.We change
>
to>=
to allowwarmup_frac = 0.0
, then on the first stepstep_frac = 0.0
andcoef = 1.0
, otherwise we'd get a divide by 0 error on the first step.