INFO: 2024-07-16 11:13:58,028 model.py:945] model_kwargs: {'device_map': 'npu:0'}
Loading checkpoint shards: 22%|██████████████████▍ | 8/37 [00:20<01:12, 2.51s/it]
Traceback (most recent call last):
File "/model/swift/swift/cli/deploy.py", line 5, in <module>
deploy_main()
File "/model/swift/swift/utils/run_utils.py", line 27, in x_main
result = llm_x(args, **kwargs)
File "/model/swift/swift/llm/deploy.py", line 557, in llm_deploy
model, template = prepare_model_template(args)
File "/model/swift/swift/llm/infer.py", line 179, in prepare_model_template
model, tokenizer = get_model_tokenizer(
File "/model/swift/swift/llm/utils/model.py", line 5572, in get_model_tokenizer
model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs, load_model, **kwargs)
File "/model/swift/swift/llm/utils/model.py", line 3149, in get_model_tokenizer_qwen2_chat
return get_model_tokenizer_with_flash_attn(model_dir, torch_dtype, model_kwargs, load_model, **kwargs)
File "/model/swift/swift/llm/utils/model.py", line 2529, in get_model_tokenizer_with_flash_attn
return get_model_tokenizer_from_repo(
File "/model/swift/swift/llm/utils/model.py", line 947, in get_model_tokenizer_from_repo
model = automodel_class.from_pretrained(
File "/root/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 113, in from_pretrained
module_obj = module_class.from_pretrained(model_dir, *model_args,
File "/root/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
return model_class.from_pretrained(
File "/root/.local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 76, in from_pretrained
return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
File "/root/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3838, in from_pretrained
) = cls._load_pretrained_model(
File "/root/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4298, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/root/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 895, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/root/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 404, in set_module_tensor_to_device
new_value = value.to(device)
RuntimeError: NPU out of memory. Tried to allocate 464.00 MiB (NPU 0; 60.94 GiB total capacity; 30.56 GiB already allocated; 30.56 GiB current active; 228.99 MiB free; 30.57 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.
4卡检测如下:
(swift-npu) root@ai-c34f1d01ae0140b69ca1c0217594daa6-744d9b857-zkdv5:/model/swift# python
Python 3.10.14 (main, May 6 2024, 19:36:58) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
from transformers.utils import is_torch_npu_available
import torch
print(is_torch_npu_available()) # True
True
print(torch.npu.device_count()) # 8
4
print(torch.randn(10, device='npu:0'))
Warning: Device do not support double dtype now, dtype cast repalce with float.
tensor([-1.1842, 0.1592, 1.1293, 1.2888, -0.2644, -1.8703, -0.6397, -0.9527,
0.0838, -1.1983], device='npu:0')
你好,我在npu的4卡上部署qwen2-72b-instruct模型的时候,出现现存溢出。同样的模型、同样的4卡使用mindie可以推理。使用命令如下
错误信息如下:
4卡检测如下:
(swift-npu) root@ai-c34f1d01ae0140b69ca1c0217594daa6-744d9b857-zkdv5:/model/swift# python Python 3.10.14 (main, May 6 2024, 19:36:58) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.