modelscope / ms-swift

Use PEFT or Full-parameter to finetune 350+ LLMs or 90+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
3.65k stars 312 forks source link

[HELP] 请问DeepSeek-v2-Chat SFT大概需要多少资源 #1048

Closed robinsonmd closed 3 months ago

robinsonmd commented 4 months ago
  1. 880G够用吗,微调最佳实践脚本里面写的880GB,不确定是一个Node还是多个Node?
  2. 还有就是想问最佳实践中微调占用的内存是多少?1T内存吗?
hjh0119 commented 3 months ago

在examples/pytorch/llm/scripts/deepseek-v2-chat/lora_mp/sft.sh 下有注明,8卡A100 device map需要8*80G

robinsonmd commented 3 months ago

在examples/pytorch/llm/scripts/deepseek-v2-chat/lora_mp/sft.sh 下有注明,8卡A100 device map需要8*80G

用了examples/pytorch/llm/scripts/deepseek-v2-chat/lora_mp/sft.sh,发现有下面这些问题:

  1. 这个脚本换行有问题,git上拉下来换行符好像是/r/n
  2. 使用device_map_config_path以后会报错,UnboundLocalError: local variable 'model_kwargs' referenced before assignment, model_kwargs应该在文件swift/llm/sft.py第48行上面预定义一下吧
    # Loading Model and Tokenizer
    model_kwargs = {}
    if is_deepspeed_zero3_enabled():
        model_kwargs = {'device_map': None}
  3. 使用quantization_bit参数以后会报错,UnboundLocalError: local variable 'load_in_4bit' referenced before assignment, 文件swift/llm/utils/argument.py中的select_bnb可以改成下面这样吧。

    def select_bnb(self: Union['SftArguments', 'InferArguments']) -> Tuple[Optional[Dtype], bool, bool]:
        if self.bnb_4bit_comp_dtype == 'AUTO':
            self.bnb_4bit_comp_dtype = self.dtype
    
        if self.bnb_4bit_comp_dtype != 'AUTO':
            bnb_4bit_compute_dtype = dtype_mapping_reversed[self.bnb_4bit_comp_dtype]
            assert bnb_4bit_compute_dtype in {torch.float16, torch.bfloat16, torch.float32}
        else:
            bnb_4bit_compute_dtype = None
        quantization_bit = self.quantization_bit
        if self.quant_method == 'bnb':
            if quantization_bit == 4:
                require_version('bitsandbytes')
                load_in_4bit, load_in_8bit = True, False
            elif quantization_bit == 8:
                require_version('bitsandbytes')
                load_in_4bit, load_in_8bit = False, True
            else:                                        
                load_in_4bit, load_in_8bit = False, True # 避免load_in_4bit, load_in_8bit未定义
        else:
            load_in_4bit, load_in_8bit = False, False
    
        return bnb_4bit_compute_dtype, load_in_4bit, load_in_8bit

    还有一个小建议:数据加载能不能放到模型加载前面,不然每次费老大劲才把模型加载了,然后可能数据格式写的不对就出错了,太浪费时间了

hjh0119 commented 3 months ago

嗯 确实有这个问题 感谢反馈

robinsonmd commented 3 months ago

如果gradient_checkpointing设置为true会有如下问题 image 需要将梯度缓存关闭才能正常训练

robinsonmd commented 3 months ago

如果gradient_checkpointing设置为true会有如下问题 image 需要将梯度缓存关闭才能正常训练

参考https://github.com/huggingface/transformers/issues/28499

hjh0119 commented 3 months ago

收到 在训练脚本中修正了