hiyouga / LLaMA-Factory

Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
30.28k stars 3.73k forks source link

CodeLlama-13b-Instruct-hf合并lora权重失败,NotImplementedError: Cannot copy out of meta tensor; no data! #2725

Closed YYLCyylc closed 6 months ago

YYLCyylc commented 6 months ago

Reminder

Reproduction

执行代码 python src/export_model.py --model_name_or_path /home/jingtianran/.cache/modelscope/hub/AI-ModelScope/CodeLlama-13b-Instruct-hf/ --adapter_name_or_path /home/jingtianran/nlp/DB-GPT-Hub/dbgpt_hub/output/adapter/codellama-13b-sql-sft-lora/ --template default --finetuning_type lora --export_dir /home/jingtianran/nlp/LLaMA-Factory-main/output/ --export_legacy_format False --export_size 2

错误信息 Traceback (most recent call last): File "/home/jingtianran/nlp/LLaMA-Factory-main/src/export_model.py", line 9, in main() File "/home/jingtianran/nlp/LLaMA-Factory-main/src/export_model.py", line 5, in main export_model() File "/home/jingtianran/nlp/LLaMA-Factory-main/src/llmtuner/train/tuner.py", line 64, in export_model model = model.to(getattr(model.config, "torch_dtype")).to("cpu") File "/home/jingtianran/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2556, in to return super().to(*args, **kwargs) File "/home/jingtianran/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to return self._apply(convert) File "/home/jingtianran/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/home/jingtianran/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/home/jingtianran/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) [Previous line repeated 2 more times] File "/home/jingtianran/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply param_applied = fn(param) File "/home/jingtianran/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) NotImplementedError: Cannot copy out of meta tensor; no data!

Expected behavior

我尝试将下载的微调lora权重和的原模型CodeLlama-13b-Instruct-hf合并 lora权重https://huggingface.co/Wangzaistone123/CodeLlama-13b-sql-lora 模型链接https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

Others

No response

hiyouga commented 6 months ago

你有预留足够的显存吗?

YYLCyylc commented 6 months ago

我用40gA100,观察到显存占用最多只到24g左右

hiyouga commented 6 months ago

用 clidemo 能正常推理吗

YYLCyylc commented 6 months ago

确实是显存不够,虽然有一张空卡,但是我之前没有输入CUDA_VISIBLE_DEVICES=0,第一张卡占到24g后会自动用第二张卡的显存,导致报错