microsoft / DeepSpeedExamples

Example models using DeepSpeed
Apache License 2.0
6.11k stars 1.05k forks source link

an error with gradient checkpointing in DeepspeedChat step 3 #908

Open wangyuwen1999 opened 5 months ago

wangyuwen1999 commented 5 months ago

image why should we disable gradient checkpointing in every step? Is it meanings that step 3 don't support actor gradient checkpointing yet?