[HELP] 请问DeepSeek-v2-Chat SFT大概需要多少资源

robinsonmd commented 4 months ago

880G够用吗，微调最佳实践脚本里面写的880GB，不确定是一个Node还是多个Node？
还有就是想问最佳实践中微调占用的内存是多少？1T内存吗？

hjh0119 commented 3 months ago

在examples/pytorch/llm/scripts/deepseek-v2-chat/lora_mp/sft.sh 下有注明，8卡A100 device map需要8*80G

robinsonmd commented 3 months ago

在examples/pytorch/llm/scripts/deepseek-v2-chat/lora_mp/sft.sh 下有注明，8卡A100 device map需要8*80G

用了examples/pytorch/llm/scripts/deepseek-v2-chat/lora_mp/sft.sh，发现有下面这些问题：

这个脚本换行有问题，git上拉下来换行符好像是/r/n
使用device_map_config_path以后会报错，UnboundLocalError: local variable 'model_kwargs' referenced before assignment, model_kwargs应该在文件swift/llm/sft.py第48行上面预定义一下吧
```
# Loading Model and Tokenizer
model_kwargs = {}
if is_deepspeed_zero3_enabled():
    model_kwargs = {'device_map': None}
```

使用quantization_bit参数以后会报错，UnboundLocalError: local variable 'load_in_4bit' referenced before assignment，文件swift/llm/utils/argument.py中的select_bnb可以改成下面这样吧。

def select_bnb(self: Union['SftArguments', 'InferArguments']) -> Tuple[Optional[Dtype], bool, bool]:
    if self.bnb_4bit_comp_dtype == 'AUTO':
        self.bnb_4bit_comp_dtype = self.dtype

    if self.bnb_4bit_comp_dtype != 'AUTO':
        bnb_4bit_compute_dtype = dtype_mapping_reversed[self.bnb_4bit_comp_dtype]
        assert bnb_4bit_compute_dtype in {torch.float16, torch.bfloat16, torch.float32}
    else:
        bnb_4bit_compute_dtype = None
    quantization_bit = self.quantization_bit
    if self.quant_method == 'bnb':
        if quantization_bit == 4:
            require_version('bitsandbytes')
            load_in_4bit, load_in_8bit = True, False
        elif quantization_bit == 8:
            require_version('bitsandbytes')
            load_in_4bit, load_in_8bit = False, True
        else:                                        
            load_in_4bit, load_in_8bit = False, True # 避免load_in_4bit, load_in_8bit未定义
    else:
        load_in_4bit, load_in_8bit = False, False

    return bnb_4bit_compute_dtype, load_in_4bit, load_in_8bit

还有一个小建议：数据加载能不能放到模型加载前面，不然每次费老大劲才把模型加载了，然后可能数据格式写的不对就出错了，太浪费时间了

hjh0119 commented 3 months ago

嗯确实有这个问题感谢反馈

robinsonmd commented 3 months ago

如果gradient_checkpointing设置为true会有如下问题需要将梯度缓存关闭才能正常训练

robinsonmd commented 3 months ago

如果gradient_checkpointing设置为true会有如下问题需要将梯度缓存关闭才能正常训练

参考https://github.com/huggingface/transformers/issues/28499

hjh0119 commented 3 months ago

收到在训练脚本中修正了

modelscope / ms-swift

[HELP] 请问DeepSeek-v2-Chat SFT大概需要多少资源 #1048