Closed wetdog closed 1 month ago
Add linear scheduler as mentioned in the model configuration section of the paper "We utilized a linear decay learning rate schedule with a peak learning rate of 7.5 × 10−5 and incorporated a warm-up phase for the initial 20,000 updates"
Add linear scheduler as mentioned in the model configuration section of the paper "We utilized a linear decay learning rate schedule with a peak learning rate of 7.5 × 10−5 and incorporated a warm-up phase for the initial 20,000 updates"