microsoft / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.9k stars 345 forks source link

Fix ConstantGradScaler and loss-scale argument not match #376

Open BeingGod opened 7 months ago

BeingGod commented 7 months ago

The usage and description of loss-scale is inconsistent. The argument of loss-scale expect to get a number of positive power of 2 but ConstantGradScaler set loss-scale to real scale directly rather than 2**loss-scale.

Argument Description:

image

Argument Usage:

image
BeingGod commented 7 months ago

Could you help me review this PR ? @tjruwase