unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.81k stars 1.32k forks source link

Deepspeed Zero3 support #225

Open songkq opened 9 months ago

songkq commented 9 months ago

@danielhanchen Hi, could you please give some advice for this issue? DPO training failed with Deepspeed Zero3 offload.

pip install "unsloth[cu121-ampere-torch211] @ git+https://github.com/unslothai/unsloth.git"
torch                         2.1.1+cu121
unsloth                       2024.1
Driver Version: 535.129.03   CUDA Version: 12.2

Traceback (most recent call last):
  File "/workspace/llm_tuning/DPO/LLaMA-Factory/src/train_bash.py", line 14, in <module>
    main()
  File "/workspace/llm_tuning/DPO/LLaMA-Factory/src/train_bash.py", line 5, in main
    run_exp()
  File "/workspace/llm_tuning/DPO/LLaMA-Factory/src/llmtuner/train/tuner.py", line 38, in run_exp
    run_dpo(model_args, data_args, training_args, finetuning_args, callbacks)
  File "/workspace/llm_tuning/DPO/LLaMA-Factory/src/llmtuner/train/dpo/workflow.py", line 30, in run_dpo
    model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
  File "/workspace/llm_tuning/DPO/LLaMA-Factory/src/llmtuner/model/loader.py", line 83, in load_model
    model, _ = FastLanguageModel.from_pretrained(**unsloth_kwargs)
  File "/miniconda3/envs/llm_factory_unsloth_tf437/lib/python3.10/site-packages/unsloth/models/loader.py", line 79, in from_pretrained
    return dispatch_model.from_pretrained(
  File "/miniconda3/envs/llm_factory_unsloth_tf437/lib/python3.10/site-packages/unsloth/models/llama.py", line 689, in from_pretrained
    model = FastLlamaModel.post_patch(model)
  File "/miniconda3/envs/llm_factory_unsloth_tf437/lib/python3.10/site-packages/unsloth/models/llama.py", line 738, in post_patch
    model.model.embed_tokens = torch.nn.Embedding.from_pretrained(model.model.embed_tokens.weight)
  File "/miniconda3/envs/llm_factory_unsloth_tf437/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 210, in from_pretrained
    assert embeddings.dim() == 2, \
AssertionError: Embeddings parameter is expected to be 2-dimensional

deepspeed_z3_offload_config.json

{
  "train_batch_size": "auto",
  "train_micro_batch_size_per_gpu": "auto",
  "gradient_accumulation_steps": "auto",
  "gradient_clipping": "auto",
  "zero_allow_untested_optimizer": true,
  "fp16": {
    "enabled": "auto",
    "loss_scale": 0,
    "loss_scale_window": 1000,
    "initial_scale_power": 16,
    "hysteresis": 2,
    "min_loss_scale": 1
  },
  "bf16": {
    "enabled": "auto"
  },
  "zero_optimization": {
    "stage": 3,
    "offload_optimizer": {
      "device": "cpu",
      "pin_memory": true
    },
    "offload_param": {
      "device": "cpu",
      "pin_memory": true
    },
    "overlap_comm": true,
    "contiguous_gradients": true,
    "sub_group_size": 1e9,
    "reduce_bucket_size": "auto",
    "stage3_prefetch_bucket_size": "auto",
    "stage3_param_persistence_threshold": "auto",
    "stage3_max_live_parameters": 1e9,
    "stage3_max_reuse_distance": 1e9,
    "stage3_gather_16bit_weights_on_model_save": true
  }
danielhanchen commented 9 months ago

Oh looks like Hiyouga answered it! :) Sadly haven't gotten to make Deepspeed work with Unsloth yet!

thedarkzeno commented 8 months ago

what do we need to make it work with deepspeed?

danielhanchen commented 8 months ago

@thedarkzeno Oh it's much more complicated sadly