Open jianweimama opened 3 weeks ago
After rollback BigDL and IPEX-LLM to version 0619, this problem disappeared.
Hi, @jianweimama , we will inform you immediately once the bug is fixed.
Hi, @jianweimama , this bug is fixed and you could try the new nightly version (later than 2.1.0b20240625) of ipex-llm.
thanks a lot, will try it soon.
HOST安装的步骤 conda create -n llm python=3.11 conda activate llm
below command will install intel_extension_for_pytorch==2.1.10+xpu as default
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install transformers==4.37.0 pip install oneccl_bind_pt==2.1.100 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
configures OneAPI environment variables
source /opt/intel/oneapi/setvars.sh pip install git+https://github.com/microsoft/DeepSpeed.git@ed8aed5 pip install git+https://github.com/intel/intel-extension-for-deepspeed.git@0eb734b pip install mpi4py conda install -c conda-forge -y gperftools=2.10 # to enable tcmalloc
安装的pip包 (llm-deepspeed) root@test-server:~/test/ipex-llm/python/llm/example/GPU/Deepspeed-AutoTP# pip3 freeze accelerate==0.23.0 annotated-types==0.7.0 bigdl-core-xe-21==2.5.0b20240620 bigdl-core-xe-addons-21==2.5.0b20240620 bigdl-core-xe-batch-21==2.5.0b20240620 certifi==2024.6.2 charset-normalizer==3.3.2 deepspeed @ git+https://github.com/microsoft/DeepSpeed.git@ed8aed5703d97b6e52d0fca3e4be285e21c005f2 filelock==3.15.3 fsspec==2024.6.0 hjson==3.1.0 huggingface-hub==0.23.4 idna==3.7 intel-cmplr-lib-ur==2024.2.0 intel-extension-for-pytorch==2.1.10+xpu intel-openmp==2024.2.0 intel_extension_for_deepspeed @ file:///root/intel-extension-for-deepspeed ipex-llm==2.1.0b20240620 Jinja2==3.1.4 MarkupSafe==2.1.5 mpi4py==3.1.6 mpmath==1.3.0 networkx==3.3 ninja==1.11.1.1 numpy==1.26.4 oneccl-bind-pt==2.1.100+xpu packaging==24.1 pillow==10.3.0 protobuf==5.27.1 psutil==6.0.0 py-cpuinfo==9.0.0 pydantic==2.7.4 pydantic_core==2.18.4 pynvml==11.5.0 PyYAML==6.0.2rc1 regex==2024.5.15 requests==2.32.3 safetensors==0.4.3 sentencepiece==0.2.0 sympy==1.13.0rc2 tabulate==0.9.0 tokenizers==0.15.2 torch==2.1.0a0+cxx11.abi torchvision==0.16.0a0+cxx11.abi tqdm==4.66.4 transformers==4.37.0 typing_extensions==4.12.2 urllib3==2.2.2
(llm-deepspeed) root@test-server:~/test/ipex-llm/python/llm/example/GPU/Deepspeed-AutoTP# bash run_qwen_14b_arc_2_card.sh
:: initializing oneAPI environment ... run_qwen_14b_arc_2_card.sh: BASH_VERSION = 5.1.16(1)-release args: Using "$@" for setvars.sh arguments: --force :: ccl -- latest :: compiler -- latest :: dal -- latest :: debugger -- latest :: dev-utilities -- latest :: dnnl -- latest :: dpcpp-ct -- latest :: dpl -- latest :: ipp -- latest :: ippcp -- latest :: mkl -- latest :: mpi -- latest :: tbb -- latest :: oneAPI environment initialized ::
[0] /root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from
[1] model = optimize_model(model.module.to(f'cpu'), low_bit=low_bit).to(torch.float16)
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/optimize.py", line 253, in optimize_model
[1] model = ggml_convert_low_bit(model,
[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/convert.py", line 790, in ggml_convert_low_bit
[1] model = _optimize_pre(model)
[1] ^^^^^^^^^^^^^^^^^^^^
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/convert.py", line 739, in _optimize_pre
[1] model.apply(padding_mlp)
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply
[1] module.apply(fn)
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply
[1] module.apply(fn)
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply
[1] module.apply(fn)
[1] [Previous line repeated 1 more time]
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 898, in apply
[1] fn(self)
[1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen2.py", line 304, in padding_mlp
[1] new_gate_weight[:intermediate_size, :] = gate_weight
[1]
[0] model = optimize_model(model.module.to(f'cpu'), low_bit=low_bit).to(torch.float16)
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/optimize.py", line 253, in optimize_model
[0] model = ggml_convert_low_bit(model,
[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/convert.py", line 790, in ggml_convert_low_bit
[0] model = _optimize_pre(model)
[0] ^^^^^^^^^^^^^^^^^^^^
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/convert.py", line 739, in _optimize_pre
[0] model.apply(padding_mlp)
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply
[0] module.apply(fn)
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply
[0] module.apply(fn)
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply
[0] module.apply(fn)
[0] [Previous line repeated 1 more time]
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 898, in apply
[0] fn(self)
[0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen2.py", line 304, in padding_mlp
[0] new_gate_weight[:intermediate_size, :] = gate_weight
[0]
torchvision.io
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpeg
orlibpng
installed before buildingtorchvision
from source? [0] warn( [1] /root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality fromtorchvision.io
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpeg
orlibpng
installed before buildingtorchvision
from source? [1] warn( [0] [2024-06-21 23:13:11,872] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect) [1] [2024-06-21 23:13:11,951] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect) [0] [2024-06-21 23:13:12,241] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified) [1] [2024-06-21 23:13:12,325] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified) Loading checkpoint shards: 100%|██████████| 8/8 [00:16<00:00, 2.04s/it][1] [1] [2024-06-21 23:13:29,421] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD [1] [2024-06-21 23:13:29,422] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference [1] [2024-06-21 23:13:29,422] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [1] [2024-06-21 23:13:29,422] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 Loading checkpoint shards: 100%|██████████| 8/8 [00:17<00:00, 2.21s/it][0] [0] [2024-06-21 23:13:30,640] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD [0] [2024-06-21 23:13:30,640] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference [0] [2024-06-21 23:13:30,640] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [0] [2024-06-21 23:13:30,640] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 [1] Using /root/.cache/torch_extensions/py311_cpu as PyTorch extensions root... [1] Emitting ninja build file /root/.cache/torch_extensions/py311_cpu/deepspeed_ccl_comm/build.ninja... [1] Building extension module deepspeed_ccl_comm... [1] Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1] ninja: no work to do. [1] Loading extension module deepspeed_ccl_comm... [0] Using /root/.cache/torch_extensions/py311_cpu as PyTorch extensions root... [0] Emitting ninja build file /root/.cache/torch_extensions/py311_cpu/deepspeed_ccl_comm/build.ninja... [0] Building extension module deepspeed_ccl_comm... [0] Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [0] ninja: no work to do. [0] Loading extension module deepspeed_ccl_comm... [0] My guessed rank = 0 [0] 2024:06:21-23:13:40:(1676750) |CCL_WARN| sockets exchange mode is set. It may cause potential problem of 'Too many open file descriptors' [1] My guessed rank = 1 [1] 2024:06:21-23:13:40:(1676751) |CCL_WARN| sockets exchange mode is set. It may cause potential problem of 'Too many open file descriptors' [0] Time to load deepspeed_ccl_comm op: 0.11093568801879883 seconds [0] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully [0] [2024-06-21 23:13:41,150] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [1] Time to load deepspeed_ccl_comm op: 0.10797476768493652 seconds [1] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully [1] [2024-06-21 23:13:41,150] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [1] [2024-06-21 23:13:41,150] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x7fa20a3d3d90> [0] [2024-06-21 23:13:41,150] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x7b8ba0ce0510> [1] [2024-06-21 23:13:41,150] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment... [0] [2024-06-21 23:13:41,150] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment... [1] [2024-06-21 23:13:41,485] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=1, local_rank=1, world_size=2, master_addr=172.16.182.230, master_port=29500 [0] [2024-06-21 23:13:41,485] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=2, master_addr=172.16.182.230, master_port=29500 [0] [2024-06-21 23:13:41,485] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized [0] 2024-06-21 23:13:44,774 - ipex_llm.transformers.utils - INFO - Converting the current model to sym_int4 format...... [1] 2024-06-21 23:13:44,774 - ipex_llm.transformers.utils - INFO - Converting the current model to sym_int4 format...... [1] /root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op [1] warnings.warn("Initializing zero-element tensors is a no-op") [0] /root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op [0] warnings.warn("Initializing zero-element tensors is a no-op") [1] AutoTP: [(<class 'transformers.models.qwen2.modeling_qwen2.Qwen2DecoderLayer'>, ['self_attn.o_proj', 'mlp.down_proj'])] [1] Traceback (most recent call last): [1] File "/root/test/ipex-llm/python/llm/example/GPU/Deepspeed-AutoTP/deepspeed_autotp.py", line 85, in~~~^^^^^^^^^^^^^^^^^^^^^^^ [1] RuntimeError: The expanded size of the tensor (2560) must match the existing size (5120) at non-singleton dimension 1. Target sizes: [13696, 2560]. Tensor sizes: [6848, 5120] [0] AutoTP: [(<class 'transformers.models.qwen2.modeling_qwen2.Qwen2DecoderLayer'>, ['self_attn.o_proj', 'mlp.down_proj'])] [0] Traceback (most recent call last): [0] File "/root/test/ipex-llm/python/llm/example/GPU/Deepspeed-AutoTP/deepspeed_autotp.py", line 85, in~~~^^^^^^^^^^^^^^^^^^^^^^^ [0] RuntimeError: The expanded size of the tensor (2560) must match the existing size (5120) at non-singleton dimension 1. Target sizes: [13696, 2560]. Tensor sizes: [6848, 5120] [0] free(): invalid pointer [0] [0] LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)[0] [0] LIBXSMM_TARGET: spr [Intel(R) Xeon(R) Gold 6438N] [0] Registry and code: 13 MB[0] [0] Command: python [0] deepspee[0] d_autot[0] p.py --[0] repo-id[0] -or-mode[0] l-path[0] /root[0] /ipex-[0] llm/Qw[0] en1.5-[0] 14B-Chat[0] --low[0] -bit sy[0] m_int4[0] [0] Uptime: 35.240733 s [1] free(): invalid size [1] [1] LIBXSMM_VERSION: main_stable-1.17-3651 (25693763) [1] LIBXSMM_TARGET: spr [Intel(R) Xeon(R) Gold 6438N] [1] Registry and code: 13 MB [1] Command: python deepspeed_autotp.py --repo-id-or-model-path /root/ipex-llm/Qwen1.5-14B-Chat --low-bit sym_int4[1] [1] Uptime: 35.150173 s=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 0 PID 1676750 RUNNING AT test-server = KILLED BY SIGNAL: 6 (Aborted)
=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 1 PID 1676751 RUNNING AT test-server = KILLED BY SIGNAL: 6 (Aborted)