Fix confidence loss to scale up correctly

Previously, weighting of confidence loss would jump from step_frac just before step_frac > warmup_frac to 1.0 when step_frac > warmup_frac.

For example, if warmup_frac = 0.1, then when step_frac = 0.1, coef = step_frac = 0.1, and then at the next step step_frac > 0.1 = warmup_frac, so coef = 1.0.

The paper describes this weighting as increasing smoothly. The change makes it increase linearly from 0 to 1 during the first warmup_frac steps.

We change > to >= to allow warmup_frac = 0.0, then on the first step step_frac = 0.0 and coef = 1.0, otherwise we'd get a divide by 0 error on the first step.

openai / weak-to-strong

Fix confidence loss to scale up correctly #33