intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.28k stars 1.23k forks source link

ubuntu 22.04 MTL 165h benchmark Aborted (core dumped) #11256

Open taotao1-1 opened 1 month ago

taotao1-1 commented 1 month ago

(llm) peiyuan@peiyuan:~/ipex-llm/python/llm/dev/benchmark/all-in-one$ python run.py /home/peiyuan/miniconda3/envs/llm/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /home/peiyuan/miniconda3/envs/llm/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn( ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu.so.1 ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_vpu.so.1 ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_vpu.so.1 failed with libze_intel_vpu.so.1: cannot open shared object file: No such file or directory ZE_LOADER_DEBUG_TRACE:check_drivers(flags=0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED)) ZE_LOADER_DEBUG_TRACE:init driver libze_intel_gpu.so.1 zeInit(0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED)) returning ZE_RESULT_SUCCESS ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu.so.1 ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_vpu.so.1 ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_vpu.so.1 failed with libze_intel_vpu.so.1: cannot open shared object file: No such file or directory ZE_LOADER_DEBUG_TRACE:check_drivers(flags=0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED)) ZE_LOADER_DEBUG_TRACE:init driver libze_intel_gpu.so.1 zeInit(0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED)) returning ZE_RESULT_SUCCESS 2024-06-07 11:29:58,545 - INFO - intel_extension_for_pytorch auto imported Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 10.09it/s] 2024-06-07 11:29:59,727 - INFO - Converting the current model to sym_int4 format...... ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu.so.1 ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_vpu.so.1 ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_vpu.so.1 failed with libze_intel_vpu.so.1: cannot open shared object file: No such file or directory ZE_LOADER_DEBUG_TRACE:check_drivers(flags=0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED)) ZE_LOADER_DEBUG_TRACE:init driver libze_intel_gpu.so.1 zeInit(0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED)) returning ZE_RESULT_SUCCESS

loading of model costs 7.7113322770019295s and 6.005859375GB <class 'transformers_modules.Qwen-7B-Chat.modeling_qwen.QWenLMHeadModel'> /home/peiyuan/miniconda3/envs/llm/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:394: UserWarning: do_sample is set to False. However, top_p is set to 0.8 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p. warnings.warn( /home/peiyuan/miniconda3/envs/llm/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:404: UserWarning: do_sample is set to False. However, top_k is set to 0 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_k. warnings.warn( LLVM ERROR: Diag: aborted

LIBXSMM_VERSION: main_stable-1.17-3651 (25693763) LIBXSMM_TARGET: adl [Intel(R) Core(TM) Ultra 7 165H] Registry and code: 13 MB Command: python run.py Uptime: 17.947757 s Aborted (core dumped)

qiuxin2012 commented 1 month ago

Find a lots of build error files in all-in-one folder:

_ZTSZZN3gpu5xetla4fmha32fmha_forward_causal_strided_implINS0_22fmha_policy_64x256x256EN4sycl3_V16detail9half_impl4halfELb1ELb0ELb1ELb1ELb0EEEvRNS5_5queueEPT0_SC_SC_SC_SC_PhSC_fffjjjjjjjENKUlRNS5_7handlerEE_clESF_EUlNS5_7nd_itemILi3EEEE_.errors.txt
_ZTSZZN3gpu5xetla4fmha32fmha_forward_causal_strided_implINS0_22fmha_policy_64x256x256EN4sycl3_V16detail9half_impl4halfELb1ELb1ELb0ELb0ELb0EEEvRNS5_5queueEPT0_SC_SC_SC_SC_PhSC_fffjjjjjjjENKUlRNS5_7handlerEE_clESF_EUlNS5_7nd_itemILi3EEEE_.errors.txt
_ZTSZZN3gpu5xetla4fmha32fmha_forward_causal_strided_implINS0_22fmha_policy_64x256x256EN4sycl3_V16detail9half_impl4halfELb1ELb1ELb0ELb1ELb0EEEvRNS5_5queueEPT0_SC_SC_SC_SC_PhSC_fffjjjjjjjENKUlRNS5_7handlerEE_clESF_EUlNS5_7nd_itemILi3EEEE_.errors.txt
_ZTSZZN3gpu5xetla4fmha32fmha_forward_causal_strided_implINS0_22fmha_policy_64x256x256EN4sycl3_V16detail9half_impl4halfELb1ELb1ELb1ELb1ELb0EEEvRNS5_5queueEPT0_SC_SC_SC_SC_PhSC_fffjjjjjjjENKUlRNS5_7handlerEE_clESF_EUlNS5_7nd_itemILi3EEEE_.errors.txt

The content is:

EEvRNS5_5queueEPT0_SC_SC_SC_SC_PhSC_fffjjjjjjjENKUlRNS5_7handlerEE_clESF_EUlNS5_7nd_itemILi3EEEE_.errors.txt
Instruction / Operand / Region Errors:

/--------------------------------------------!!!INSTRUCTION ERROR FOUND!!!---------------------------------------------\
Error in CISA routine with name: _ZTSZZN3gpu5xetla4fmha32fmha_forward_causal_strided_implINS0_22fmha_policy_64x256x256EN4sycl3_V16detail9half_impl4halfELb1ELb1ELb1ELb1ELb0EEEvRNS5_5queueEPT0_SC_SC_SC_SC_PhSC_fffjjjjjjjENKUlRNS5_7handlerEE_clESF_EUlNS5_7nd_itemILi3EEEE_
                  Error Message: vISA instruction not supported on this platform
                    Diagnostics:
   Instruction variables' decls: 
                                 .decl V93 v_type=G type=b num_elts=4 align=dword
                                 .decl V93 v_type=G type=b num_elts=4 align=dword

          Violating Instruction:     nbarrier.wait V93(0,0)<0;1,0>                                                /// $83
\----------------------------------------------------------------------------------------------------------------------/

This is caused by wrong result of has_xetla, the code go to https://github.com/intel-analytics/ipex-llm/blob/6f2684e5c900eedfab7a7a3fcb0b1c705b9050cb/python/llm/src/ipex_llm/transformers/models/qwen.py#L252 image

qiuxin2012 commented 1 month ago

fixed in https://github.com/intel-analytics/ipex-llm/pull/11263