CodeLlama-13b-Instruct-hf合并lora权重失败，NotImplementedError: Cannot copy out of meta tensor; no data!

YYLCyylc commented 6 months ago

Reminder

[X] I have read the README and searched the existing issues.

Reproduction

执行代码 python src/export_model.py --model_name_or_path /home/jingtianran/.cache/modelscope/hub/AI-ModelScope/CodeLlama-13b-Instruct-hf/ --adapter_name_or_path /home/jingtianran/nlp/DB-GPT-Hub/dbgpt_hub/output/adapter/codellama-13b-sql-sft-lora/ --template default --finetuning_type lora --export_dir /home/jingtianran/nlp/LLaMA-Factory-main/output/ --export_legacy_format False --export_size 2

错误信息 Traceback (most recent call last): File "/home/jingtianran/nlp/LLaMA-Factory-main/src/export_model.py", line 9, in main() File "/home/jingtianran/nlp/LLaMA-Factory-main/src/export_model.py", line 5, in main export_model() File "/home/jingtianran/nlp/LLaMA-Factory-main/src/llmtuner/train/tuner.py", line 64, in export_model model = model.to(getattr(model.config, "torch_dtype")).to("cpu") File "/home/jingtianran/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2556, in to return super().to(*args, **kwargs) File "/home/jingtianran/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to return self._apply(convert) File "/home/jingtianran/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/home/jingtianran/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/home/jingtianran/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) [Previous line repeated 2 more times] File "/home/jingtianran/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply param_applied = fn(param) File "/home/jingtianran/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) NotImplementedError: Cannot copy out of meta tensor; no data!

Expected behavior

我尝试将下载的微调lora权重和的原模型CodeLlama-13b-Instruct-hf合并 lora权重https://huggingface.co/Wangzaistone123/CodeLlama-13b-sql-lora 模型链接https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

transformers version: 4.38.2
Platform: Linux-4.15.0-197-generic-x86_64-with-glibc2.27
Python version: 3.10.13
Huggingface_hub version: 0.21.3
Safetensors version: 0.4.2
Accelerate version: 0.27.2
Accelerate config: not found
PyTorch version (GPU?): 2.1.2+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Others

No response

hiyouga commented 6 months ago

你有预留足够的显存吗？

YYLCyylc commented 6 months ago

我用40gA100,观察到显存占用最多只到24g左右

hiyouga commented 6 months ago

用 clidemo 能正常推理吗

YYLCyylc commented 6 months ago

确实是显存不够，虽然有一张空卡，但是我之前没有输入CUDA_VISIBLE_DEVICES=0，第一张卡占到24g后会自动用第二张卡的显存，导致报错

hiyouga / LLaMA-Factory