intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.75k stars 1.27k forks source link

failure load the Qwen2-72B-Instruct with FP6 on 4 ARC GPU #11786

Open oldmikeyang opened 3 months ago

oldmikeyang commented 3 months ago

(ipex-llm-0812) llm@GPU-Xeon4410Y-ARC770:~/ipex-llm-0812/python/llm/dev/benchmark/all-in-one$ bash run-deepspeed-arc.sh

:: initializing oneAPI environment ... run-deepspeed-arc.sh: BASH_VERSION = 5.1.16(1)-release args: Using "$@" for oneapi-vars.sh arguments: --force :: advisor -- processing etc/advisor/vars.sh :: ccl -- processing etc/ccl/vars.sh :: compiler -- processing etc/compiler/vars.sh :: dal -- processing etc/dal/vars.sh :: debugger -- processing etc/debugger/vars.sh :: dpct -- processing etc/dpct/vars.sh :: dpl -- processing etc/dpl/vars.sh :: ipp -- processing etc/ipp/vars.sh :: ippcp -- processing etc/ippcp/vars.sh :: mkl -- processing etc/mkl/vars.sh :: mpi -- processing etc/mpi/vars.sh :: tbb -- processing etc/tbb/vars.sh :: vtune -- processing etc/vtune/vars.sh :: oneAPI environment initialized ::

[0] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations [0] warnings.warn( [2] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations [2] warnings.warn( [1] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations [1] warnings.warn( [3] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations [3] warnings.warn( [0] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? [0] warn( [2] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? [2] warn( [1] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? [1] warn( [3] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? [3] warn( [0] [2024-08-14 08:46:09,295] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect) [2] [2024-08-14 08:46:09,296] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect) [1] [2024-08-14 08:46:09,346] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect) [0] model_path: /home/llm/local_models/Qwen/Qwen2-72B-Instruct [0] [2024-08-14 08:46:09,451] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified) [2] model_path: /home/llm/local_models/Qwen/Qwen2-72B-Instruct [2] [2024-08-14 08:46:09,452] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified) [1] model_path: /home/llm/local_models/Qwen/Qwen2-72B-Instruct [1] [2024-08-14 08:46:09,509] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified) [3] [2024-08-14 08:46:09,575] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect) [3] model_path: /home/llm/local_models/Qwen/Qwen2-72B-Instruct [3] [2024-08-14 08:46:09,735] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified) Loading checkpoint shards: 100%|██████████| 37/37 [05:21<00:00, 8.69s/it][0] Loading checkpoint shards: 100%|██████████| 37/37 [05:21<00:00, 8.69s/it] Loading checkpoint shards: 100%|██████████| 37/37 [05:21<00:00, 8.69s/it][3] Loading checkpoint shards: 100%|██████████| 37/37 [05:21<00:00, 8.70s/it] [3] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [3] [2024-08-14 08:51:33,893] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD [3] [2024-08-14 08:51:33,894] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference [3] [2024-08-14 08:51:33,894] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [3] [2024-08-14 08:51:33,894] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 [0] [2024-08-14 08:51:33,899] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD [0] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [0] [2024-08-14 08:51:33,900] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference [0] [2024-08-14 08:51:33,900] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [0] [2024-08-14 08:51:33,900] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 [1] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [1] [2024-08-14 08:51:33,901] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD [1] [2024-08-14 08:51:33,901] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference [1] [2024-08-14 08:51:33,902] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [1] [2024-08-14 08:51:33,902] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 [2] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [2] [2024-08-14 08:51:33,916] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD [2] [2024-08-14 08:51:33,917] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference [2] [2024-08-14 08:51:33,917] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [2] [2024-08-14 08:51:33,917] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 [3] Using /home/llm/.cache/torch_extensions/py311_cpu as PyTorch extensions root... [3] Emitting ninja build file /home/llm/.cache/torch_extensions/py311_cpu/deepspeed_ccl_comm/build.ninja... [3] Building extension module deepspeed_ccl_comm... [3] Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [3] ninja: no work to do. [3] Loading extension module deepspeed_ccl_comm... [2] Using /home/llm/.cache/torch_extensions/py311_cpu as PyTorch extensions root... [2] Emitting ninja build file /home/llm/.cache/torch_extensions/py311_cpu/deepspeed_ccl_comm/build.ninja... [2] Building extension module deepspeed_ccl_comm... [2] Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [0] Using /home/llm/.cache/torch_extensions/py311_cpu as PyTorch extensions root... [2] ninja: no work to do. [2] Loading extension module deepspeed_ccl_comm... [1] Using /home/llm/.cache/torch_extensions/py311_cpu as PyTorch extensions root... [1] Emitting ninja build file /home/llm/.cache/torch_extensions/py311_cpu/deepspeed_ccl_comm/build.ninja... [1] Building extension module deepspeed_ccl_comm... [1] Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1] ninja: no work to do. [1] Loading extension module deepspeed_ccl_comm... [0] Loading extension module deepspeed_ccl_comm... [0] Time to load deepspeed_ccl_comm op: 0.20324254035949707 seconds [0] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully [0] [2024-08-14 08:53:14,600] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [1] Time to load deepspeed_ccl_comm op: 0.08019113540649414 seconds [1] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully [1] [2024-08-14 08:53:14,600] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [2] Time to load deepspeed_ccl_comm op: 0.07995343208312988 seconds [2] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully [2] [2024-08-14 08:53:14,600] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [3] Time to load deepspeed_ccl_comm op: 0.0811920166015625 seconds [3] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully [3] [2024-08-14 08:53:14,600] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [0] [2024-08-14 08:53:14,600] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x7691cbeba210> [1] [2024-08-14 08:53:14,600] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x768e6fb41ed0> [0] [2024-08-14 08:53:14,601] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment... [1] [2024-08-14 08:53:14,601] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment... [2] [2024-08-14 08:53:14,600] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x79d917879690> [2] [2024-08-14 08:53:14,601] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment... [3] [2024-08-14 08:53:14,600] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x76966c33c790> [3] [2024-08-14 08:53:14,601] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment... [0] My guessed rank = 0 [1] My guessed rank = 1 [3] My guessed rank = 3 [2] My guessed rank = 2 [0] [2024-08-14 08:53:14,995] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=4, master_addr=10.240.108.91, master_port=29500 [0] [2024-08-14 08:53:14,995] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized [2] [2024-08-14 08:53:14,995] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=2, local_rank=2, world_size=4, master_addr=10.240.108.91, master_port=29500 [3] [2024-08-14 08:53:14,995] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=3, local_rank=3, world_size=4, master_addr=10.240.108.91, master_port=29500 [1] [2024-08-14 08:53:14,996] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=1, local_rank=1, world_size=4, master_addr=10.240.108.91, master_port=29500 [0] 2024-08-14 08:53:37,101 - INFO - Converting the current model to fp6 format...... [3] 2024-08-14 08:53:37,101 - INFO - Converting the current model to fp6 format...... [1] 2024-08-14 08:53:37,102 - INFO - Converting the current model to fp6 format...... [3] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op [3] warnings.warn("Initializing zero-element tensors is a no-op") [0] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op [0] warnings.warn("Initializing zero-element tensors is a no-op") [1] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op [1] warnings.warn("Initializing zero-element tensors is a no-op") [2] 2024-08-14 08:53:37,139 - INFO - Converting the current model to fp6 format...... [2] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op [2] warnings.warn("Initializing zero-element tensors is a no-op") [3] AutoTP: [(<class 'transformers.models.qwen2.modeling_qwen2.Qwen2DecoderLayer'>, ['mlp.down_proj', 'self_attn.o_proj'])] [3] >> loading of model costs 447.36217987899727s [3] [2024-08-14 08:55:08,948] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to xpu (model specified) [1] AutoTP: [(<class 'transformers.models.qwen2.modeling_qwen2.Qwen2DecoderLayer'>, ['self_attn.o_proj', 'mlp.down_proj'])] [1] >> loading of model costs 447.5882513519973s [1] [2024-08-14 08:55:08,980] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to xpu (model specified) [2] AutoTP: [(<class 'transformers.models.qwen2.modeling_qwen2.Qwen2DecoderLayer'>, ['mlp.down_proj', 'self_attn.o_proj'])] [2] >> loading of model costs 447.6830987180001s [2] [2024-08-14 08:55:09,438] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to xpu (model specified) [0] AutoTP: [(<class 'transformers.models.qwen2.modeling_qwen2.Qwen2DecoderLayer'>, ['mlp.down_proj', 'self_attn.o_proj'])] [0] >> loading of model costs 447.6456385010024s [0] [2024-08-14 08:55:10,077] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to xpu (model specified) [3] Traceback (most recent call last): [3] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 2002, in [3] run_model(model, api, in_out_pairs, conf['local_model_hub'], conf['warm_up'], conf['num_trials'], conf['num_beams'], [3] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 170, in run_model [3] result = run_deepspeed_optimize_model_gpu(repo_id, local_model_hub, in_out_pairs, warm_up, num_trials, num_beams, low_bit, batch_size, cpu_embedding) [3] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [3] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 1689, in run_deepspeed_optimize_model_gpu [3] model = model.to(f'xpu:{local_rank}') [3] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [3] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2597, in to [3] return super().to(*args, kwargs) [3] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [3] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1160, in to [3] return self._apply(convert) [3] ^^^^^^^^^^^^^^^^^^^^ [3] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply [3] module._apply(fn) [3] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 833, in _apply [3] param_applied = fn(param) [3] ^^^^^^^^^ [3] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1158, in convert [3] return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) [3] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [3] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/ipex_llm/transformers/low_bit_linear.py", line 492, in to [3] new_param = FP4Params(super().to(device=device, [3] ^^^^^^^^^^^^^^^^^^^^^^^^^ [3] RuntimeError: Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES) [2] Traceback (most recent call last): [2] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 2002, in [2] run_model(model, api, in_out_pairs, conf['local_model_hub'], conf['warm_up'], conf['num_trials'], conf['num_beams'], [2] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 170, in run_model [2] result = run_deepspeed_optimize_model_gpu(repo_id, local_model_hub, in_out_pairs, warm_up, num_trials, num_beams, low_bit, batch_size, cpu_embedding) [2] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [2] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 1689, in run_deepspeed_optimize_model_gpu [2] model = model.to(f'xpu:{local_rank}') [2] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [2] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2597, in to [2] return super().to(*args, *kwargs) [2] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [2] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1160, in to [2] return self._apply(convert) [2] ^^^^^^^^^^^^^^^^^^^^ [2] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply [2] module._apply(fn) [2] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 833, in _apply [2] param_applied = fn(param) [2] ^^^^^^^^^ [2] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1158, in convert [2] return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) [2] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [2] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/ipex_llm/transformers/low_bit_linear.py", line 492, in to [2] new_param = FP4Params(super().to(device=device, [2] ^^^^^^^^^^^^^^^^^^^^^^^^^ [2] RuntimeError: Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES) [1] Traceback (most recent call last): [1] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 2002, in [1] run_model(model, api, in_out_pairs, conf['local_model_hub'], conf['warm_up'], conf['num_trials'], conf['num_beams'], [1] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 170, in run_model [1] result = run_deepspeed_optimize_model_gpu(repo_id, local_model_hub, in_out_pairs, warm_up, num_trials, num_beams, low_bit, batch_size, cpu_embedding) [1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [1] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 1689, in run_deepspeed_optimize_model_gpu [1] model = model.to(f'xpu:{local_rank}') [1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [1] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2597, in to [1] return super().to(args, kwargs) [1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [1] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1160, in to [1] return self._apply(convert) [1] ^^^^^^^^^^^^^^^^^^^^ [1] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply [1] module._apply(fn) [1] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 833, in _apply [1] param_applied = fn(param) [1] ^^^^^^^^^ [1] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1158, in convert [1] return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) [1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [1] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/ipex_llm/transformers/low_bit_linear.py", line 492, in to [1] new_param = FP4Params(super().to(device=device, [1] ^^^^^^^^^^^^^^^^^^^^^^^^^ [1] RuntimeError: Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES) [0] Traceback (most recent call last): [0] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 2002, in [0] run_model(model, api, in_out_pairs, conf['local_model_hub'], conf['warm_up'], conf['num_trials'], conf['num_beams'], [0] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 170, in run_model [0] result = run_deepspeed_optimize_model_gpu(repo_id, local_model_hub, in_out_pairs, warm_up, num_trials, num_beams, low_bit, batch_size, cpu_embedding) [0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [0] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 1689, in run_deepspeed_optimize_model_gpu [0] model = model.to(f'xpu:{local_rank}') [0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [0] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2597, in to [0] return super().to(*args, **kwargs) [0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [0] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1160, in to [0] return self._apply(convert) [0] ^^^^^^^^^^^^^^^^^^^^ [0] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply [0] module._apply(fn) [0] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 833, in _apply [0] param_applied = fn(param) [0] ^^^^^^^^^ [0] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1158, in convert [0] return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) [0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [0] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/ipex_llm/transformers/low_bit_linear.py", line 492, in to [0] new_param = FP4Params(super().to(device=device, [0] ^^^^^^^^^^^^^^^^^^^^^^^^^ [0] RuntimeError: Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)

(ipex-llm-0812) llm@GPU-Xeon4410Y-ARC770:~/ipex-llm-0812/python/llm/scripts$ bash env-check.sh

PYTHON_VERSION=3.11.9

Transformers is not installed.

PyTorch is not installed.

ipex-llm Version: 2.1.0b20240811

IPEX is not installed.

CPU Information: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 52 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Silver 4410Y CPU family: 6 Model: 143 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 2 Stepping: 8 CPU max MHz: 3900.0000 CPU min MHz: 800.0000 BogoMIPS: 4000.00

Total CPU Memory: 755.547 GB

Operating System: Ubuntu 22.04.4 LTS \n \l


Linux GPU-Xeon4410Y-ARC770 6.8.0-39-generic #39~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jul 10 15:35:09 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

CLI: Version: 1.2.27.20240626 Build ID: 7f002d24

Service: Version: 1.2.27.20240626 Build ID: 7f002d24 Level Zero Version: 1.16.0

Driver UUID 32342e31-332e-3239-3133-382e37000000 Driver Version 24.13.29138.7 Driver UUID 32342e31-332e-3239-3133-382e37000000 Driver Version 24.13.29138.7 Driver UUID 32342e31-332e-3239-3133-382e37000000 Driver Version 24.13.29138.7 Driver UUID 32342e31-332e-3239-3133-382e37000000 Driver Version 24.13.29138.7 Driver UUID 32342e31-332e-3239-3133-382e37000000 Driver Version 24.13.29138.7 Driver UUID 32342e31-332e-3239-3133-382e37000000 Driver Version 24.13.29138.7 Driver UUID 32342e31-332e-3239-3133-382e37000000 Driver Version 24.13.29138.7 Driver UUID 32342e31-332e-3239-3133-382e37000000 Driver Version 24.13.29138.7

Driver related package version: ii intel-fw-gpu 2024.17.5-329~22.04 all Firmware package for Intel integrated and discrete GPUs ii intel-i915-dkms 1.24.3.23.240419.26+i30-1 all Out of tree i915 driver. ii intel-level-zero-gpu 1.3.29138.7 amd64 Intel(R) Graphics Compute Runtime for oneAPI Level Zero. ii level-zero-dev 1.16.15-881~22.04 amd64 Intel(R) Graphics Compute Runtime for oneAPI Level Zero.

env-check.sh: line 167: sycl-ls: command not found igpu not detected

xpu-smi is properly installed.

+-----------+--------------------------------------------------------------------------------------+ | Device ID | Device Information | +-----------+--------------------------------------------------------------------------------------+ | 0 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-0019-0000-000856a08086 | | | PCI BDF Address: 0000:19:00.0 | | | DRM Device: /dev/dri/card1 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 1 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-002c-0000-000856a08086 | | | PCI BDF Address: 0000:2c:00.0 | | | DRM Device: /dev/dri/card2 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 2 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-0052-0000-000856a08086 | | | PCI BDF Address: 0000:52:00.0 | | | DRM Device: /dev/dri/card3 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 3 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-0065-0000-000856a08086 | | | PCI BDF Address: 0000:65:00.0 | | | DRM Device: /dev/dri/card4 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 4 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-009b-0000-000856a08086 | | | PCI BDF Address: 0000:9b:00.0 | | | DRM Device: /dev/dri/card5 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 5 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-00ad-0000-000856a08086 | | | PCI BDF Address: 0000:ad:00.0 | | | DRM Device: /dev/dri/card6 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 6 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-00d1-0000-000856a08086 | | | PCI BDF Address: 0000:d1:00.0 | | | DRM Device: /dev/dri/card7 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 7 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-00e3-0000-000856a08086 | | | PCI BDF Address: 0000:e3:00.0 | | | DRM Device: /dev/dri/card8 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ GPU0 Memory size=16M GPU1 Memory size=16G GPU2 Memory size=16G GPU3 Memory size=16G GPU4 Memory size=16G GPU5 Memory size=16G GPU6 Memory size=16G GPU7 Memory size=16G GPU8 Memory size=16G

03:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 52) (prog-if 00 [VGA controller]) DeviceName: Onboard VGA Subsystem: ASPEED Technology, Inc. ASPEED Graphics Family Flags: medium devsel, IRQ 16, NUMA node 0 Memory at 94000000 (32-bit, non-prefetchable) [size=16M] Memory at 95000000 (32-bit, non-prefetchable) [size=256K] I/O ports at 2000 [size=128] Capabilities: Kernel driver in use: ast Kernel modules: ast

19:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334 Flags: bus master, fast devsel, latency 0, IRQ 130, NUMA node 0 Memory at 9e000000 (64-bit, non-prefetchable) [size=16M] Memory at 5f800000000 (64-bit, prefetchable) [size=16G] Expansion ROM at 9f000000 [disabled] [size=2M] Capabilities: Kernel driver in use: i915 Kernel modules: xe, i915

2c:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334 Flags: bus master, fast devsel, latency 0, IRQ 133, NUMA node 0 Memory at a8000000 (64-bit, non-prefetchable) [size=16M] Memory at 6f800000000 (64-bit, prefetchable) [size=16G] Expansion ROM at a9000000 [disabled] [size=2M] Capabilities: Kernel driver in use: i915 Kernel modules: xe, i915

52:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334 Flags: bus master, fast devsel, latency 0, IRQ 136, NUMA node 0 Memory at bc000000 (64-bit, non-prefetchable) [size=16M] Memory at 8f800000000 (64-bit, prefetchable) [size=16G] Expansion ROM at bd000000 [disabled] [size=2M] Capabilities: Kernel driver in use: i915 Kernel modules: xe, i915

65:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334 Flags: bus master, fast devsel, latency 0, IRQ 139, NUMA node 0 Memory at c6000000 (64-bit, non-prefetchable) [size=16M] Memory at 9f800000000 (64-bit, prefetchable) [size=16G] Expansion ROM at c7000000 [disabled] [size=2M] Capabilities: Kernel driver in use: i915 Kernel modules: xe, i915

9b:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334 Flags: bus master, fast devsel, latency 0, IRQ 142, NUMA node 1 Memory at d8000000 (64-bit, non-prefetchable) [size=16M] Memory at cf800000000 (64-bit, prefetchable) [size=16G] Expansion ROM at d9000000 [disabled] [size=2M] Capabilities: Kernel driver in use: i915 Kernel modules: xe, i915

ad:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334 Flags: bus master, fast devsel, latency 0, IRQ 145, NUMA node 1 Memory at e0000000 (64-bit, non-prefetchable) [size=16M] Memory at df800000000 (64-bit, prefetchable) [size=16G] Expansion ROM at e1000000 [disabled] [size=2M] Capabilities: Kernel driver in use: i915 Kernel modules: xe, i915

d1:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334 Flags: bus master, fast devsel, latency 0, IRQ 148, NUMA node 1 Memory at f1000000 (64-bit, non-prefetchable) [size=16M] Memory at ff800000000 (64-bit, prefetchable) [size=16G] Expansion ROM at f2000000 [disabled] [size=2M] Capabilities: Kernel driver in use: i915 Kernel modules: xe, i915

e3:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: Intel Corporation Device 1020 Flags: bus master, fast devsel, latency 0, IRQ 151, NUMA node 1 Memory at f9000000 (64-bit, non-prefetchable) [size=16M] Memory at 10f800000000 (64-bit, prefetchable) [size=16G] Expansion ROM at fa000000 [disabled] [size=2M] Capabilities: Kernel driver in use: i915 Kernel modules: xe, i915

plusbang commented 3 months ago

Hi, @oldmikeyang, the error is caused by out of GPU memory. We haven't experimented 72B & fp6 through deepspeed autotp on 4 ARC, and please try vllm tp and pipeline parallel instead.

Take pipeline parallel for example, you could set cpu_embedding=True and export IPEX_LLM_LOW_MEM=1 to run 72B & fp6 & 1024 in-128 out & batch=1.