Adding two loss from actor will lead to an error " gradient computed twice for this partition"

piekey1994 commented 1 year ago

When training the ppo model, I turned on the gradient_checkpointing_enable. If you want to calculate ptx loss, then actor will forward twice. In your code, these two loss are executed backward once separately, which will not be any problem. However, if I add these two loss and then use the engine's backward, then the error "gradient computed twice for this partition" will appear. If I don't use the option of gradient_checkpointing_enable, this error will not occur. This error seems to only appear in the zero mode of deepspeed, and I don't know why.

hijkzzz commented 1 year ago

same issue in the training of Reward model

hijkzzz commented 1 year ago

Has this issue fixed?This will significantly affect the training efficiency of RLHF.

hyj1991 commented 11 months ago

Same issue, is there any progress?

iFe1er commented 6 months ago

same issue, any progress here?

microsoft / DeepSpeedExamples

Adding two loss from actor will lead to an error " gradient computed twice for this partition" #458