intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.3k stars 1.23k forks source link

non-singleton dimension errors when run Deepspeed-AutoTP #11392

Open jianweimama opened 3 weeks ago

jianweimama commented 3 weeks ago

HOST安装的步骤 conda create -n llm python=3.11 conda activate llm

below command will install intel_extension_for_pytorch==2.1.10+xpu as default

pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install transformers==4.37.0 pip install oneccl_bind_pt==2.1.100 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

configures OneAPI environment variables

source /opt/intel/oneapi/setvars.sh pip install git+https://github.com/microsoft/DeepSpeed.git@ed8aed5 pip install git+https://github.com/intel/intel-extension-for-deepspeed.git@0eb734b pip install mpi4py conda install -c conda-forge -y gperftools=2.10 # to enable tcmalloc

安装的pip包 (llm-deepspeed) root@test-server:~/test/ipex-llm/python/llm/example/GPU/Deepspeed-AutoTP# pip3 freeze accelerate==0.23.0 annotated-types==0.7.0 bigdl-core-xe-21==2.5.0b20240620 bigdl-core-xe-addons-21==2.5.0b20240620 bigdl-core-xe-batch-21==2.5.0b20240620 certifi==2024.6.2 charset-normalizer==3.3.2 deepspeed @ git+https://github.com/microsoft/DeepSpeed.git@ed8aed5703d97b6e52d0fca3e4be285e21c005f2 filelock==3.15.3 fsspec==2024.6.0 hjson==3.1.0 huggingface-hub==0.23.4 idna==3.7 intel-cmplr-lib-ur==2024.2.0 intel-extension-for-pytorch==2.1.10+xpu intel-openmp==2024.2.0 intel_extension_for_deepspeed @ file:///root/intel-extension-for-deepspeed ipex-llm==2.1.0b20240620 Jinja2==3.1.4 MarkupSafe==2.1.5 mpi4py==3.1.6 mpmath==1.3.0 networkx==3.3 ninja==1.11.1.1 numpy==1.26.4 oneccl-bind-pt==2.1.100+xpu packaging==24.1 pillow==10.3.0 protobuf==5.27.1 psutil==6.0.0 py-cpuinfo==9.0.0 pydantic==2.7.4 pydantic_core==2.18.4 pynvml==11.5.0 PyYAML==6.0.2rc1 regex==2024.5.15 requests==2.32.3 safetensors==0.4.3 sentencepiece==0.2.0 sympy==1.13.0rc2 tabulate==0.9.0 tokenizers==0.15.2 torch==2.1.0a0+cxx11.abi torchvision==0.16.0a0+cxx11.abi tqdm==4.66.4 transformers==4.37.0 typing_extensions==4.12.2 urllib3==2.2.2

(llm-deepspeed) root@test-server:~/test/ipex-llm/python/llm/example/GPU/Deepspeed-AutoTP# bash run_qwen_14b_arc_2_card.sh

:: initializing oneAPI environment ... run_qwen_14b_arc_2_card.sh: BASH_VERSION = 5.1.16(1)-release args: Using "$@" for setvars.sh arguments: --force :: ccl -- latest :: compiler -- latest :: dal -- latest :: debugger -- latest :: dev-utilities -- latest :: dnnl -- latest :: dpcpp-ct -- latest :: dpl -- latest :: ipp -- latest :: ippcp -- latest :: mkl -- latest :: mpi -- latest :: tbb -- latest :: oneAPI environment initialized ::

[0] /root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? [0] warn( [1] /root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? [1] warn( [0] [2024-06-21 23:13:11,872] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect) [1] [2024-06-21 23:13:11,951] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect) [0] [2024-06-21 23:13:12,241] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified) [1] [2024-06-21 23:13:12,325] [INFO] [real_accelerator.py:211:set_accelerator] Setting ds_accelerator to cpu (model specified) Loading checkpoint shards: 100%|██████████| 8/8 [00:16<00:00, 2.04s/it][1] [1] [2024-06-21 23:13:29,421] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD [1] [2024-06-21 23:13:29,422] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference [1] [2024-06-21 23:13:29,422] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [1] [2024-06-21 23:13:29,422] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 Loading checkpoint shards: 100%|██████████| 8/8 [00:17<00:00, 2.21s/it][0] [0] [2024-06-21 23:13:30,640] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.14.1+ed8aed57, git-hash=ed8aed57, git-branch=HEAD [0] [2024-06-21 23:13:30,640] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference [0] [2024-06-21 23:13:30,640] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [0] [2024-06-21 23:13:30,640] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 [1] Using /root/.cache/torch_extensions/py311_cpu as PyTorch extensions root... [1] Emitting ninja build file /root/.cache/torch_extensions/py311_cpu/deepspeed_ccl_comm/build.ninja... [1] Building extension module deepspeed_ccl_comm... [1] Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1] ninja: no work to do. [1] Loading extension module deepspeed_ccl_comm... [0] Using /root/.cache/torch_extensions/py311_cpu as PyTorch extensions root... [0] Emitting ninja build file /root/.cache/torch_extensions/py311_cpu/deepspeed_ccl_comm/build.ninja... [0] Building extension module deepspeed_ccl_comm... [0] Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [0] ninja: no work to do. [0] Loading extension module deepspeed_ccl_comm... [0] My guessed rank = 0 [0] 2024:06:21-23:13:40:(1676750) |CCL_WARN| sockets exchange mode is set. It may cause potential problem of 'Too many open file descriptors' [1] My guessed rank = 1 [1] 2024:06:21-23:13:40:(1676751) |CCL_WARN| sockets exchange mode is set. It may cause potential problem of 'Too many open file descriptors' [0] Time to load deepspeed_ccl_comm op: 0.11093568801879883 seconds [0] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully [0] [2024-06-21 23:13:41,150] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [1] Time to load deepspeed_ccl_comm op: 0.10797476768493652 seconds [1] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully [1] [2024-06-21 23:13:41,150] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [1] [2024-06-21 23:13:41,150] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x7fa20a3d3d90> [0] [2024-06-21 23:13:41,150] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x7b8ba0ce0510> [1] [2024-06-21 23:13:41,150] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment... [0] [2024-06-21 23:13:41,150] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment... [1] [2024-06-21 23:13:41,485] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=1, local_rank=1, world_size=2, master_addr=172.16.182.230, master_port=29500 [0] [2024-06-21 23:13:41,485] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=2, master_addr=172.16.182.230, master_port=29500 [0] [2024-06-21 23:13:41,485] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized [0] 2024-06-21 23:13:44,774 - ipex_llm.transformers.utils - INFO - Converting the current model to sym_int4 format...... [1] 2024-06-21 23:13:44,774 - ipex_llm.transformers.utils - INFO - Converting the current model to sym_int4 format...... [1] /root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op [1] warnings.warn("Initializing zero-element tensors is a no-op") [0] /root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op [0] warnings.warn("Initializing zero-element tensors is a no-op") [1] AutoTP: [(<class 'transformers.models.qwen2.modeling_qwen2.Qwen2DecoderLayer'>, ['self_attn.o_proj', 'mlp.down_proj'])] [1] Traceback (most recent call last): [1] File "/root/test/ipex-llm/python/llm/example/GPU/Deepspeed-AutoTP/deepspeed_autotp.py", line 85, in [1] model = optimize_model(model.module.to(f'cpu'), low_bit=low_bit).to(torch.float16) [1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/optimize.py", line 253, in optimize_model [1] model = ggml_convert_low_bit(model, [1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/convert.py", line 790, in ggml_convert_low_bit [1] model = _optimize_pre(model) [1] ^^^^^^^^^^^^^^^^^^^^ [1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/convert.py", line 739, in _optimize_pre [1] model.apply(padding_mlp) [1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply [1] module.apply(fn) [1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply [1] module.apply(fn) [1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply [1] module.apply(fn) [1] [Previous line repeated 1 more time] [1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 898, in apply [1] fn(self) [1] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen2.py", line 304, in padding_mlp [1] new_gate_weight[:intermediate_size, :] = gate_weight [1] ~~~^^^^^^^^^^^^^^^^^^^^^^^ [1] RuntimeError: The expanded size of the tensor (2560) must match the existing size (5120) at non-singleton dimension 1. Target sizes: [13696, 2560]. Tensor sizes: [6848, 5120] [0] AutoTP: [(<class 'transformers.models.qwen2.modeling_qwen2.Qwen2DecoderLayer'>, ['self_attn.o_proj', 'mlp.down_proj'])] [0] Traceback (most recent call last): [0] File "/root/test/ipex-llm/python/llm/example/GPU/Deepspeed-AutoTP/deepspeed_autotp.py", line 85, in [0] model = optimize_model(model.module.to(f'cpu'), low_bit=low_bit).to(torch.float16) [0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/optimize.py", line 253, in optimize_model [0] model = ggml_convert_low_bit(model, [0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/convert.py", line 790, in ggml_convert_low_bit [0] model = _optimize_pre(model) [0] ^^^^^^^^^^^^^^^^^^^^ [0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/convert.py", line 739, in _optimize_pre [0] model.apply(padding_mlp) [0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply [0] module.apply(fn) [0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply [0] module.apply(fn) [0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 897, in apply [0] module.apply(fn) [0] [Previous line repeated 1 more time] [0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/torch/nn/modules/module.py", line 898, in apply [0] fn(self) [0] File "/root/miniforge3/envs/llm-deepspeed/lib/python3.11/site-packages/ipex_llm/transformers/models/qwen2.py", line 304, in padding_mlp [0] new_gate_weight[:intermediate_size, :] = gate_weight [0] ~~~^^^^^^^^^^^^^^^^^^^^^^^ [0] RuntimeError: The expanded size of the tensor (2560) must match the existing size (5120) at non-singleton dimension 1. Target sizes: [13696, 2560]. Tensor sizes: [6848, 5120] [0] free(): invalid pointer [0] [0] LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)[0] [0] LIBXSMM_TARGET: spr [Intel(R) Xeon(R) Gold 6438N] [0] Registry and code: 13 MB[0] [0] Command: python [0] deepspee[0] d_autot[0] p.py --[0] repo-id[0] -or-mode[0] l-path[0] /root[0] /ipex-[0] llm/Qw[0] en1.5-[0] 14B-Chat[0] --low[0] -bit sy[0] m_int4[0] [0] Uptime: 35.240733 s [1] free(): invalid size [1] [1] LIBXSMM_VERSION: main_stable-1.17-3651 (25693763) [1] LIBXSMM_TARGET: spr [Intel(R) Xeon(R) Gold 6438N] [1] Registry and code: 13 MB [1] Command: python deepspeed_autotp.py --repo-id-or-model-path /root/ipex-llm/Qwen1.5-14B-Chat --low-bit sym_int4[1] [1] Uptime: 35.150173 s

=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 0 PID 1676750 RUNNING AT test-server = KILLED BY SIGNAL: 6 (Aborted)

=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 1 PID 1676751 RUNNING AT test-server = KILLED BY SIGNAL: 6 (Aborted)

jianweimama commented 3 weeks ago

After rollback BigDL and IPEX-LLM to version 0619, this problem disappeared.

plusbang commented 3 weeks ago

Hi, @jianweimama , we will inform you immediately once the bug is fixed.

plusbang commented 3 weeks ago

Hi, @jianweimama , this bug is fixed and you could try the new nightly version (later than 2.1.0b20240625) of ipex-llm.

jianweimama commented 2 weeks ago

thanks a lot, will try it soon.