Closed robinsonmd closed 3 months ago
在examples/pytorch/llm/scripts/deepseek-v2-chat/lora_mp/sft.sh 下有注明,8卡A100 device map需要8*80G
在examples/pytorch/llm/scripts/deepseek-v2-chat/lora_mp/sft.sh 下有注明,8卡A100 device map需要8*80G
用了examples/pytorch/llm/scripts/deepseek-v2-chat/lora_mp/sft.sh,发现有下面这些问题:
UnboundLocalError: local variable 'model_kwargs' referenced before assignment
, model_kwargs
应该在文件swift/llm/sft.py
第48行上面预定义一下吧
# Loading Model and Tokenizer
model_kwargs = {}
if is_deepspeed_zero3_enabled():
model_kwargs = {'device_map': None}
使用quantization_bit参数以后会报错,UnboundLocalError: local variable 'load_in_4bit' referenced before assignment
, 文件swift/llm/utils/argument.py
中的select_bnb
可以改成下面这样吧。
def select_bnb(self: Union['SftArguments', 'InferArguments']) -> Tuple[Optional[Dtype], bool, bool]:
if self.bnb_4bit_comp_dtype == 'AUTO':
self.bnb_4bit_comp_dtype = self.dtype
if self.bnb_4bit_comp_dtype != 'AUTO':
bnb_4bit_compute_dtype = dtype_mapping_reversed[self.bnb_4bit_comp_dtype]
assert bnb_4bit_compute_dtype in {torch.float16, torch.bfloat16, torch.float32}
else:
bnb_4bit_compute_dtype = None
quantization_bit = self.quantization_bit
if self.quant_method == 'bnb':
if quantization_bit == 4:
require_version('bitsandbytes')
load_in_4bit, load_in_8bit = True, False
elif quantization_bit == 8:
require_version('bitsandbytes')
load_in_4bit, load_in_8bit = False, True
else:
load_in_4bit, load_in_8bit = False, True # 避免load_in_4bit, load_in_8bit未定义
else:
load_in_4bit, load_in_8bit = False, False
return bnb_4bit_compute_dtype, load_in_4bit, load_in_8bit
还有一个小建议:数据加载能不能放到模型加载前面,不然每次费老大劲才把模型加载了,然后可能数据格式写的不对就出错了,太浪费时间了
嗯 确实有这个问题 感谢反馈
如果gradient_checkpointing设置为true会有如下问题 需要将梯度缓存关闭才能正常训练
如果gradient_checkpointing设置为true会有如下问题 需要将梯度缓存关闭才能正常训练
参考https://github.com/huggingface/transformers/issues/28499
收到 在训练脚本中修正了