Closed CHNRyan closed 4 months ago
You can try to use zero_init, but I believe for HF the correct method is to place HfDeepSpeedConfig before the from_pretrained
method. See this comment and issue thread https://github.com/microsoft/DeepSpeed/issues/3168#issuecomment-1546151533
You can try to use zero_init, but I believe for HF the correct method is to place HfDeepSpeedConfig before the
from_pretrained
method. See this comment and issue thread #3168 (comment)
@jomayeri Thanks for your reply! I use DS and Accelerate and "zero_init" setted in my Accelerate config "zero3_init_flag: true". And I put the TrainingArguments before from_pretrained to asure I'm using zero_init. But I didn't use HfDeepSpeedConfig, should I use it with HFTrainer?
And I also find that in deepspeed/runtime/engine.py, when run the _configure_distributed_model the is_zero_init_model is judged to False though I use "zero3_init_flag: true". And then all parameters load to GPUs.
Hmm, the config might not be passed properly from the Trainer. I'll investigate that. Can you check if adding HfDeepSpeedConfig
before from_pretrained
works for you?
@jomayeri According to "However, if you want to use DeepSpeed without the [Trainer](https://huggingface.co/docs/transformers/v4.41.3/en/main_classes/trainer#transformers.Trainer), Transformers provides a HfDeepSpeedConfig class." in https://huggingface.co/docs/transformers/main_classes/deepspeed, I think it don't need to add HfDeepSpeedConfig because once I set zero3 before from_pretrained(), trainer or trl will do correct things automatically. And I find that maybe is bnb prevent zero_init because it runs successfully after I remove bnb_config.
@jomayeri Maybe you can see https://github.com/microsoft/DeepSpeed/issues/5660 for more details. Thanks!
Closing in favor of https://github.com/microsoft/DeepSpeed/issues/5660
Describe the bug I'm fine tuning Llama2 using deepspeed zero3. I found that parameters load to CPU memory during from_pretrained, and at the begining of trainer.train(), params will fully load to each GPU WITHOUT ANY PARTITION. Then they partitioned to GPUs.
To Reproduce Here is my code:
Here is my accelerate config:
Here is my deepspeed config:
Expected behavior Parameters first partition and then load to GPUs.
System info (please complete the following information):
Launcher context
I will truly appreciate if anyone can help me solve it ! @loadams @tjruwase @deepcharm