ChatGLM2-6b模型合并时报错ValueError: We need an `offload_dir` to dispatch this model according to this `device_map`, the following submodules need to be offloaded #50
(venv) PS C:\MyFiles\AI\model\chatglm2> python merge_lora_and_quantize.py --lora_path saved_files/chatGLM_6B_QLoRA_t32 --output_path /tmp/merged_qlora_model_4bit --remote_scripts_dir remote_scripts/chatglm2-6b --qbits 4
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:07<00:00, 1.10s/it]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
Traceback (most recent call last):
File "C:\MyFiles\AI\model\chatglm2\merge_lora_and_quantize.py", line 80, in <module>
main(lora_path=args.lora_path,
File "C:\MyFiles\AI\model\chatglm2\merge_lora_and_quantize.py", line 54, in main
merged_model, lora_config = merge_lora(lora_path, device_map)
File "C:\MyFiles\AI\model\chatglm2\merge_lora_and_quantize.py", line 28, in merge_lora
model = PeftModel.from_pretrained(base_model, lora_path, device_map=device_map)
File "C:\MyFiles\AI\model\chatglm2\venv\lib\site-packages\peft\peft_model.py", line 181, in from_pretrained
model.load_adapter(model_id, adapter_name, **kwargs)
File "C:\MyFiles\AI\model\chatglm2\venv\lib\site-packages\peft\peft_model.py", line 406, in load_adapter
dispatch_model(
File "C:\MyFiles\AI\model\chatglm2\venv\lib\site-packages\accelerate\big_modeling.py", line 374, in dispatch_model
raise ValueError(
ValueError: We need an `offload_dir` to dispatch this model according to this `device_map`, the following submodules need to be offloaded: base_model.model.transformer.encoder.layers.12, base_model.model.transformer.encoder.layers.13, base_model.model.transformer.encoder.layers.14, base_model.model.transformer.encoder.layers.15, base_model.model.transformer.encoder.layers.16, base_model.model.transformer.encoder.layers.17, base_model.model.transformer.encoder.layers.18, base_model.model.transformer.encoder.layers.19, base_model.model.transformer.encoder.layers.20, base_model.model.transformer.encoder.layers.21, base_model.model.transformer.encoder.layers.22, base_model.model.transformer.encoder.layers.23, base_model.model.transformer.encoder.layers.24, base_model.model.transformer.encoder.layers.25, base_model.model.transformer.encoder.layers.26, base_model.model.transformer.encoder.layers.27, base_model.model.transformer.encoder.final_layernorm, base_model.model.transformer.output_layer.
(venv) PS C:\MyFiles\AI\model\chatglm2>
(venv) PS C:\MyFiles\AI\model\chatglm2> python merge_lora_and_quantize.py --lora_path saved_files/chatGLM_6B_QLoRA_t32 --output_path /tmp/merged_qlora_model_4bit --remote_scripts_dir remote_scripts/chatglm2-6b --qbits 4
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:07<00:00, 1.07s/it]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
WARNING:root:Some parameters are on the meta device device because they were offloaded to the disk and cpu.
Traceback (most recent call last):
File "C:\MyFiles\AI\model\chatglm2\merge_lora_and_quantize.py", line 80, in <module>
main(lora_path=args.lora_path,
File "C:\MyFiles\AI\model\chatglm2\merge_lora_and_quantize.py", line 56, in main
quantized_model = quantize(merged_model, qbits)
File "C:\MyFiles\AI\model\chatglm2\merge_lora_and_quantize.py", line 35, in quantize
qmodel = model.quantize(qbits).half().cuda()
File "C:\Users\71977\.cache\huggingface\modules\transformers_modules\chatglm2-6b\modeling_chatglm.py", line 1197, in quantize
self.transformer.encoder = quantize(self.transformer.encoder, bits, empty_init=empty_init, device=device,
File "C:\Users\71977\.cache\huggingface\modules\transformers_modules\chatglm2-6b\quantization.py", line 157, in quantize
weight=layer.self_attention.query_key_value.weight.to(torch.cuda.current_device()),
NotImplementedError: Cannot copy out of meta tensor; no data!
初始模型
GPU内存
部分依赖项版本:
我尝试了
在merge_lora_and_quantize.py文件中加入offload_dir参数
结果产生了如下报错