Closed Costwen closed 1 year ago
I tried train the sentence-xl on 8*A100 but got oom with deepspeed. Have you evry been used deepspeed to train model?
{ "train_micro_batch_size_per_gpu": "auto", "gradient_accumulation_steps": "auto", "zero_allow_untested_optimizer": true, "bp16": { "enabled": "auto" }, "zero_optimization": { "stage": 2, "allgather_partitions": true, "allgather_bucket_size": 5e8, "overlap_comm": false, "reduce_scatter": true, "reduce_bucket_size": 5e8, "contiguous_gradients" : true } }
here is my deepspeed config
I tried train the sentence-xl on 8*A100 but got oom with deepspeed. Have you evry been used deepspeed to train model?
here is my deepspeed config