Closed windspirit95 closed 1 year ago
It’s hard to see what exactly changed in this format, but it looks ok you just don't need to supply the training arguments twice (with and without deepspeed) and need to adjust the command line with deepspeed (see docs).
You can try it and report if you face issues here on in transformers. You can also find an example on how to use DeepSpeed with the Trainer in this thread.
Ah I see your point, thank you ^^
Hi, Since my GPU memory is low (12GB), I am finding the way to use deepspeed in training code, with CPU offload setting. Here is my modification so far:
Could you help me to check if I am doing it in right way, Thanks ^^ The DeepSpeed config is inherited from https://github.com/salesforce/jaxformer/blob/main/jaxformer/hf/train.py