Open oldmikeyang opened 3 months ago
Hi, @oldmikeyang, the error is caused by out of GPU memory. We haven't experimented 72B & fp6 through deepspeed autotp on 4 ARC, and please try vllm tp and pipeline parallel instead.
Take pipeline parallel for example, you could set cpu_embedding=True
and export IPEX_LLM_LOW_MEM=1
to run 72B & fp6 & 1024 in-128 out & batch=1.
(ipex-llm-0812) llm@GPU-Xeon4410Y-ARC770:~/ipex-llm-0812/python/llm/dev/benchmark/all-in-one$ bash run-deepspeed-arc.sh
:: initializing oneAPI environment ... run-deepspeed-arc.sh: BASH_VERSION = 5.1.16(1)-release args: Using "$@" for oneapi-vars.sh arguments: --force :: advisor -- processing etc/advisor/vars.sh :: ccl -- processing etc/ccl/vars.sh :: compiler -- processing etc/compiler/vars.sh :: dal -- processing etc/dal/vars.sh :: debugger -- processing etc/debugger/vars.sh :: dpct -- processing etc/dpct/vars.sh :: dpl -- processing etc/dpl/vars.sh :: ipp -- processing etc/ipp/vars.sh :: ippcp -- processing etc/ippcp/vars.sh :: mkl -- processing etc/mkl/vars.sh :: mpi -- processing etc/mpi/vars.sh :: tbb -- processing etc/tbb/vars.sh :: vtune -- processing etc/vtune/vars.sh :: oneAPI environment initialized ::
[0] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations [0] warnings.warn( [2] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations [2] warnings.warn( [1] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations [1] warnings.warn( [3] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations [3] warnings.warn( [0] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from
[3] run_model(model, api, in_out_pairs, conf['local_model_hub'], conf['warm_up'], conf['num_trials'], conf['num_beams'],
[3] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 170, in run_model
[3] result = run_deepspeed_optimize_model_gpu(repo_id, local_model_hub, in_out_pairs, warm_up, num_trials, num_beams, low_bit, batch_size, cpu_embedding)
[3] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[3] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 1689, in run_deepspeed_optimize_model_gpu
[3] model = model.to(f'xpu:{local_rank}')
[3] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[3] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2597, in to
[3] return super().to(*args, kwargs)
[3] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[3] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1160, in to
[3] return self._apply(convert)
[3] ^^^^^^^^^^^^^^^^^^^^
[3] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply
[3] module._apply(fn)
[3] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 833, in _apply
[3] param_applied = fn(param)
[3] ^^^^^^^^^
[3] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1158, in convert
[3] return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
[3] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[3] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/ipex_llm/transformers/low_bit_linear.py", line 492, in to
[3] new_param = FP4Params(super().to(device=device,
[3] ^^^^^^^^^^^^^^^^^^^^^^^^^
[3] RuntimeError: Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
[2] Traceback (most recent call last):
[2] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 2002, in
[2] run_model(model, api, in_out_pairs, conf['local_model_hub'], conf['warm_up'], conf['num_trials'], conf['num_beams'],
[2] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 170, in run_model
[2] result = run_deepspeed_optimize_model_gpu(repo_id, local_model_hub, in_out_pairs, warm_up, num_trials, num_beams, low_bit, batch_size, cpu_embedding)
[2] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 1689, in run_deepspeed_optimize_model_gpu
[2] model = model.to(f'xpu:{local_rank}')
[2] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2597, in to
[2] return super().to(*args, *kwargs)
[2] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1160, in to
[2] return self._apply(convert)
[2] ^^^^^^^^^^^^^^^^^^^^
[2] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply
[2] module._apply(fn)
[2] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 833, in _apply
[2] param_applied = fn(param)
[2] ^^^^^^^^^
[2] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1158, in convert
[2] return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
[2] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/ipex_llm/transformers/low_bit_linear.py", line 492, in to
[2] new_param = FP4Params(super().to(device=device,
[2] ^^^^^^^^^^^^^^^^^^^^^^^^^
[2] RuntimeError: Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
[1] Traceback (most recent call last):
[1] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 2002, in
[1] run_model(model, api, in_out_pairs, conf['local_model_hub'], conf['warm_up'], conf['num_trials'], conf['num_beams'],
[1] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 170, in run_model
[1] result = run_deepspeed_optimize_model_gpu(repo_id, local_model_hub, in_out_pairs, warm_up, num_trials, num_beams, low_bit, batch_size, cpu_embedding)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 1689, in run_deepspeed_optimize_model_gpu
[1] model = model.to(f'xpu:{local_rank}')
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2597, in to
[1] return super().to( args, kwargs)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1160, in to
[1] return self._apply(convert)
[1] ^^^^^^^^^^^^^^^^^^^^
[1] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply
[1] module._apply(fn)
[1] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 833, in _apply
[1] param_applied = fn(param)
[1] ^^^^^^^^^
[1] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1158, in convert
[1] return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/ipex_llm/transformers/low_bit_linear.py", line 492, in to
[1] new_param = FP4Params(super().to(device=device,
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^
[1] RuntimeError: Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
[0] Traceback (most recent call last):
[0] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 2002, in
[0] run_model(model, api, in_out_pairs, conf['local_model_hub'], conf['warm_up'], conf['num_trials'], conf['num_beams'],
[0] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 170, in run_model
[0] result = run_deepspeed_optimize_model_gpu(repo_id, local_model_hub, in_out_pairs, warm_up, num_trials, num_beams, low_bit, batch_size, cpu_embedding)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 1689, in run_deepspeed_optimize_model_gpu
[0] model = model.to(f'xpu:{local_rank}')
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2597, in to
[0] return super().to(*args, **kwargs)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1160, in to
[0] return self._apply(convert)
[0] ^^^^^^^^^^^^^^^^^^^^
[0] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply
[0] module._apply(fn)
[0] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 833, in _apply
[0] param_applied = fn(param)
[0] ^^^^^^^^^
[0] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1158, in convert
[0] return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/ipex_llm/transformers/low_bit_linear.py", line 492, in to
[0] new_param = FP4Params(super().to(device=device,
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^
[0] RuntimeError: Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
torchvision.io
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpeg
orlibpng
installed before buildingtorchvision
from source? [0] warn( [2] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality fromtorchvision.io
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpeg
orlibpng
installed before buildingtorchvision
from source? [2] warn( [1] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality fromtorchvision.io
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpeg
orlibpng
installed before buildingtorchvision
from source? [1] warn( [3] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality fromtorchvision.io
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpeg
orlibpng
installed before buildingtorchvision
from source? [3] warn( [0] [2024-08-14 08:46:09,295] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect) [2] [2024-08-14 08:46:09,296] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect) [1] [2024-08-14 08:46:09,346] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect) [0] model_path: /home/llm/local_models/Qwen/Qwen2-72B-Instruct [0] [2024-08-14 08:46:09,451] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified) [2] model_path: /home/llm/local_models/Qwen/Qwen2-72B-Instruct [2] [2024-08-14 08:46:09,452] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified) [1] model_path: /home/llm/local_models/Qwen/Qwen2-72B-Instruct [1] [2024-08-14 08:46:09,509] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified) [3] [2024-08-14 08:46:09,575] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect) [3] model_path: /home/llm/local_models/Qwen/Qwen2-72B-Instruct [3] [2024-08-14 08:46:09,735] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified) Loading checkpoint shards: 100%|██████████| 37/37 [05:21<00:00, 8.69s/it][0] Loading checkpoint shards: 100%|██████████| 37/37 [05:21<00:00, 8.69s/it] Loading checkpoint shards: 100%|██████████| 37/37 [05:21<00:00, 8.69s/it][3] Loading checkpoint shards: 100%|██████████| 37/37 [05:21<00:00, 8.70s/it] [3] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [3] [2024-08-14 08:51:33,893] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD [3] [2024-08-14 08:51:33,894] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference [3] [2024-08-14 08:51:33,894] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [3] [2024-08-14 08:51:33,894] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 [0] [2024-08-14 08:51:33,899] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD [0] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [0] [2024-08-14 08:51:33,900] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference [0] [2024-08-14 08:51:33,900] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [0] [2024-08-14 08:51:33,900] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 [1] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [1] [2024-08-14 08:51:33,901] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD [1] [2024-08-14 08:51:33,901] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference [1] [2024-08-14 08:51:33,902] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [1] [2024-08-14 08:51:33,902] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 [2] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [2] [2024-08-14 08:51:33,916] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD [2] [2024-08-14 08:51:33,917] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference [2] [2024-08-14 08:51:33,917] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [2] [2024-08-14 08:51:33,917] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 [3] Using /home/llm/.cache/torch_extensions/py311_cpu as PyTorch extensions root... [3] Emitting ninja build file /home/llm/.cache/torch_extensions/py311_cpu/deepspeed_ccl_comm/build.ninja... [3] Building extension module deepspeed_ccl_comm... [3] Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [3] ninja: no work to do. [3] Loading extension module deepspeed_ccl_comm... [2] Using /home/llm/.cache/torch_extensions/py311_cpu as PyTorch extensions root... [2] Emitting ninja build file /home/llm/.cache/torch_extensions/py311_cpu/deepspeed_ccl_comm/build.ninja... [2] Building extension module deepspeed_ccl_comm... [2] Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [0] Using /home/llm/.cache/torch_extensions/py311_cpu as PyTorch extensions root... [2] ninja: no work to do. [2] Loading extension module deepspeed_ccl_comm... [1] Using /home/llm/.cache/torch_extensions/py311_cpu as PyTorch extensions root... [1] Emitting ninja build file /home/llm/.cache/torch_extensions/py311_cpu/deepspeed_ccl_comm/build.ninja... [1] Building extension module deepspeed_ccl_comm... [1] Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1] ninja: no work to do. [1] Loading extension module deepspeed_ccl_comm... [0] Loading extension module deepspeed_ccl_comm... [0] Time to load deepspeed_ccl_comm op: 0.20324254035949707 seconds [0] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully [0] [2024-08-14 08:53:14,600] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [1] Time to load deepspeed_ccl_comm op: 0.08019113540649414 seconds [1] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully [1] [2024-08-14 08:53:14,600] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [2] Time to load deepspeed_ccl_comm op: 0.07995343208312988 seconds [2] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully [2] [2024-08-14 08:53:14,600] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [3] Time to load deepspeed_ccl_comm op: 0.0811920166015625 seconds [3] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully [3] [2024-08-14 08:53:14,600] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [0] [2024-08-14 08:53:14,600] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x7691cbeba210> [1] [2024-08-14 08:53:14,600] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x768e6fb41ed0> [0] [2024-08-14 08:53:14,601] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment... [1] [2024-08-14 08:53:14,601] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment... [2] [2024-08-14 08:53:14,600] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x79d917879690> [2] [2024-08-14 08:53:14,601] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment... [3] [2024-08-14 08:53:14,600] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x76966c33c790> [3] [2024-08-14 08:53:14,601] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment... [0] My guessed rank = 0 [1] My guessed rank = 1 [3] My guessed rank = 3 [2] My guessed rank = 2 [0] [2024-08-14 08:53:14,995] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=4, master_addr=10.240.108.91, master_port=29500 [0] [2024-08-14 08:53:14,995] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized [2] [2024-08-14 08:53:14,995] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=2, local_rank=2, world_size=4, master_addr=10.240.108.91, master_port=29500 [3] [2024-08-14 08:53:14,995] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=3, local_rank=3, world_size=4, master_addr=10.240.108.91, master_port=29500 [1] [2024-08-14 08:53:14,996] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=1, local_rank=1, world_size=4, master_addr=10.240.108.91, master_port=29500 [0] 2024-08-14 08:53:37,101 - INFO - Converting the current model to fp6 format...... [3] 2024-08-14 08:53:37,101 - INFO - Converting the current model to fp6 format...... [1] 2024-08-14 08:53:37,102 - INFO - Converting the current model to fp6 format...... [3] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op [3] warnings.warn("Initializing zero-element tensors is a no-op") [0] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op [0] warnings.warn("Initializing zero-element tensors is a no-op") [1] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op [1] warnings.warn("Initializing zero-element tensors is a no-op") [2] 2024-08-14 08:53:37,139 - INFO - Converting the current model to fp6 format...... [2] /home/llm/venv/ipex-llm-0812/lib/python3.11/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op [2] warnings.warn("Initializing zero-element tensors is a no-op") [3] AutoTP: [(<class 'transformers.models.qwen2.modeling_qwen2.Qwen2DecoderLayer'>, ['mlp.down_proj', 'self_attn.o_proj'])] [3] >> loading of model costs 447.36217987899727s [3] [2024-08-14 08:55:08,948] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to xpu (model specified) [1] AutoTP: [(<class 'transformers.models.qwen2.modeling_qwen2.Qwen2DecoderLayer'>, ['self_attn.o_proj', 'mlp.down_proj'])] [1] >> loading of model costs 447.5882513519973s [1] [2024-08-14 08:55:08,980] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to xpu (model specified) [2] AutoTP: [(<class 'transformers.models.qwen2.modeling_qwen2.Qwen2DecoderLayer'>, ['mlp.down_proj', 'self_attn.o_proj'])] [2] >> loading of model costs 447.6830987180001s [2] [2024-08-14 08:55:09,438] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to xpu (model specified) [0] AutoTP: [(<class 'transformers.models.qwen2.modeling_qwen2.Qwen2DecoderLayer'>, ['mlp.down_proj', 'self_attn.o_proj'])] [0] >> loading of model costs 447.6456385010024s [0] [2024-08-14 08:55:10,077] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to xpu (model specified) [3] Traceback (most recent call last): [3] File "/home/llm/ipex-llm-0812/python/llm/dev/benchmark/all-in-one/run.py", line 2002, in(ipex-llm-0812) llm@GPU-Xeon4410Y-ARC770:~/ipex-llm-0812/python/llm/scripts$ bash env-check.sh
PYTHON_VERSION=3.11.9
Transformers is not installed.
PyTorch is not installed.
ipex-llm Version: 2.1.0b20240811
IPEX is not installed.
CPU Information: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 52 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Silver 4410Y CPU family: 6 Model: 143 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 2 Stepping: 8 CPU max MHz: 3900.0000 CPU min MHz: 800.0000 BogoMIPS: 4000.00
Total CPU Memory: 755.547 GB
Operating System: Ubuntu 22.04.4 LTS \n \l
Linux GPU-Xeon4410Y-ARC770 6.8.0-39-generic #39~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jul 10 15:35:09 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
CLI: Version: 1.2.27.20240626 Build ID: 7f002d24
Service: Version: 1.2.27.20240626 Build ID: 7f002d24 Level Zero Version: 1.16.0
Driver UUID 32342e31-332e-3239-3133-382e37000000 Driver Version 24.13.29138.7 Driver UUID 32342e31-332e-3239-3133-382e37000000 Driver Version 24.13.29138.7 Driver UUID 32342e31-332e-3239-3133-382e37000000 Driver Version 24.13.29138.7 Driver UUID 32342e31-332e-3239-3133-382e37000000 Driver Version 24.13.29138.7 Driver UUID 32342e31-332e-3239-3133-382e37000000 Driver Version 24.13.29138.7 Driver UUID 32342e31-332e-3239-3133-382e37000000 Driver Version 24.13.29138.7 Driver UUID 32342e31-332e-3239-3133-382e37000000 Driver Version 24.13.29138.7 Driver UUID 32342e31-332e-3239-3133-382e37000000 Driver Version 24.13.29138.7
Driver related package version: ii intel-fw-gpu 2024.17.5-329~22.04 all Firmware package for Intel integrated and discrete GPUs ii intel-i915-dkms 1.24.3.23.240419.26+i30-1 all Out of tree i915 driver. ii intel-level-zero-gpu 1.3.29138.7 amd64 Intel(R) Graphics Compute Runtime for oneAPI Level Zero. ii level-zero-dev 1.16.15-881~22.04 amd64 Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
env-check.sh: line 167: sycl-ls: command not found igpu not detected
xpu-smi is properly installed.
+-----------+--------------------------------------------------------------------------------------+ | Device ID | Device Information | +-----------+--------------------------------------------------------------------------------------+ | 0 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-0019-0000-000856a08086 | | | PCI BDF Address: 0000:19:00.0 | | | DRM Device: /dev/dri/card1 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 1 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-002c-0000-000856a08086 | | | PCI BDF Address: 0000:2c:00.0 | | | DRM Device: /dev/dri/card2 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 2 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-0052-0000-000856a08086 | | | PCI BDF Address: 0000:52:00.0 | | | DRM Device: /dev/dri/card3 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 3 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-0065-0000-000856a08086 | | | PCI BDF Address: 0000:65:00.0 | | | DRM Device: /dev/dri/card4 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 4 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-009b-0000-000856a08086 | | | PCI BDF Address: 0000:9b:00.0 | | | DRM Device: /dev/dri/card5 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 5 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-00ad-0000-000856a08086 | | | PCI BDF Address: 0000:ad:00.0 | | | DRM Device: /dev/dri/card6 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 6 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-00d1-0000-000856a08086 | | | PCI BDF Address: 0000:d1:00.0 | | | DRM Device: /dev/dri/card7 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 7 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-00e3-0000-000856a08086 | | | PCI BDF Address: 0000:e3:00.0 | | | DRM Device: /dev/dri/card8 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ GPU0 Memory size=16M GPU1 Memory size=16G GPU2 Memory size=16G GPU3 Memory size=16G GPU4 Memory size=16G GPU5 Memory size=16G GPU6 Memory size=16G GPU7 Memory size=16G GPU8 Memory size=16G
03:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 52) (prog-if 00 [VGA controller]) DeviceName: Onboard VGA Subsystem: ASPEED Technology, Inc. ASPEED Graphics Family Flags: medium devsel, IRQ 16, NUMA node 0 Memory at 94000000 (32-bit, non-prefetchable) [size=16M] Memory at 95000000 (32-bit, non-prefetchable) [size=256K] I/O ports at 2000 [size=128] Capabilities:
Kernel driver in use: ast
Kernel modules: ast
19:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334 Flags: bus master, fast devsel, latency 0, IRQ 130, NUMA node 0 Memory at 9e000000 (64-bit, non-prefetchable) [size=16M] Memory at 5f800000000 (64-bit, prefetchable) [size=16G] Expansion ROM at 9f000000 [disabled] [size=2M] Capabilities:
Kernel driver in use: i915
Kernel modules: xe, i915
2c:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334 Flags: bus master, fast devsel, latency 0, IRQ 133, NUMA node 0 Memory at a8000000 (64-bit, non-prefetchable) [size=16M] Memory at 6f800000000 (64-bit, prefetchable) [size=16G] Expansion ROM at a9000000 [disabled] [size=2M] Capabilities:
Kernel driver in use: i915
Kernel modules: xe, i915
52:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334 Flags: bus master, fast devsel, latency 0, IRQ 136, NUMA node 0 Memory at bc000000 (64-bit, non-prefetchable) [size=16M] Memory at 8f800000000 (64-bit, prefetchable) [size=16G] Expansion ROM at bd000000 [disabled] [size=2M] Capabilities:
Kernel driver in use: i915
Kernel modules: xe, i915
65:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334 Flags: bus master, fast devsel, latency 0, IRQ 139, NUMA node 0 Memory at c6000000 (64-bit, non-prefetchable) [size=16M] Memory at 9f800000000 (64-bit, prefetchable) [size=16G] Expansion ROM at c7000000 [disabled] [size=2M] Capabilities:
Kernel driver in use: i915
Kernel modules: xe, i915
9b:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334 Flags: bus master, fast devsel, latency 0, IRQ 142, NUMA node 1 Memory at d8000000 (64-bit, non-prefetchable) [size=16M] Memory at cf800000000 (64-bit, prefetchable) [size=16G] Expansion ROM at d9000000 [disabled] [size=2M] Capabilities:
Kernel driver in use: i915
Kernel modules: xe, i915
ad:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334 Flags: bus master, fast devsel, latency 0, IRQ 145, NUMA node 1 Memory at e0000000 (64-bit, non-prefetchable) [size=16M] Memory at df800000000 (64-bit, prefetchable) [size=16G] Expansion ROM at e1000000 [disabled] [size=2M] Capabilities:
Kernel driver in use: i915
Kernel modules: xe, i915
d1:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334 Flags: bus master, fast devsel, latency 0, IRQ 148, NUMA node 1 Memory at f1000000 (64-bit, non-prefetchable) [size=16M] Memory at ff800000000 (64-bit, prefetchable) [size=16G] Expansion ROM at f2000000 [disabled] [size=2M] Capabilities:
Kernel driver in use: i915
Kernel modules: xe, i915
e3:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: Intel Corporation Device 1020 Flags: bus master, fast devsel, latency 0, IRQ 151, NUMA node 1 Memory at f9000000 (64-bit, non-prefetchable) [size=16M] Memory at 10f800000000 (64-bit, prefetchable) [size=16G] Expansion ROM at fa000000 [disabled] [size=2M] Capabilities:
Kernel driver in use: i915
Kernel modules: xe, i915