intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.44k stars 1.24k forks source link

Failed to enable AMX #9978

Closed sriraman2020 closed 7 months ago

sriraman2020 commented 7 months ago

Trying out Mixtral on PVC-1Card-1Tile with recipe from https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral

command line: python ./generate.py --prompt 'what is AI?' --repo-id-or -model-path mistralai/Mixtral-8x7B-v0.1

Get this error "Failed to enable AMX" Any suggestions pls.

jason-dai commented 7 months ago

@jenniew please take a look

jenniew commented 7 months ago

@sriraman2020 I tried this on my pvc, and didn't reproduce this issue. I think this may be your environment issue. Can you run source /opt/intel/oneapi/setvars.sh before running this example?

sriraman2020 commented 7 months ago
image

@jenniew

hkvision commented 7 months ago

Hi @sriraman2020

We again confirmed in our environment, we can't reproduce this issue. Somehow AMX fails to get enabled on your SPR, wondering is there anything special to your SPR?

As a workaround, you may use export BIGDL_LLM_AMX_DISABLED=1 to disable AMX to see if you can proceed.

sriraman2020 commented 7 months ago

@hkvision i tried with the above option. it moves ahead. But stuck with next error ImportError: libsycl.so.6: cannot open shared object file: No such file or directory

hkvision commented 7 months ago

Check this page for your error: https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#cannot-open-shared-object-file-no-such-file-or-directory This should happen if bigdl/ipex and oneapi versions mismatch, can you confirm this?

sriraman2020 commented 7 months ago

@hkvision

Actually Torch version 2.1.a0 oneapi version is 2024.0

So it seems to tally

MeouSker77 commented 7 months ago

ImportError: libsycl.so.6: cannot open shared object file: No such file or directory

please check intel_extension_for_pytorch version: pip list | grep intel_extension_for_pytorch

libsycl.so.6 is included in oneapi 2023.2, and it is required by intel_extension_for_pytorch 2.0

intel_extension_for_pytorch 2.1 requires libsycl.so.7, which is included in oneapi 2024.0

If you are using intel_extension_for_pytorch 2.0, please update it to 2.1: pip install --pre --upgrade --force-reinstall bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu

sriraman2020 commented 7 months ago

@MeouSker77 intel_extension_for_pytorch version is 2.1.10+xpu

MeouSker77 commented 7 months ago

Is that error happened when import linear_q4_0 ? If is, try

pip uninstall bigdl-core-xe
pip uninstall bigdl-core-xe-21
pip install --pre --upgrade bigdl-core-xe-21
sriraman2020 commented 7 months ago

@MeouSker77 Cool, That Worked!! Thanks!

sriraman2020 commented 7 months ago

@MeouSker77 does it work for multi GPU ? We have currently tried it on a single PVC. 1T system

jason-dai commented 7 months ago

@MeouSker77 does it work for multi GPU ? We have currently tried it on a single PVC. 1T system

Yes - see distributed inference and finetuning examples.

sriraman2020 commented 7 months ago

@MeouSker77 i tried inference on multi PVC system. bash run_llama2_70b_pvc_1550_1_card.sh GEt this error [1] Uptime: 48.521245 s

=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 0 PID 25597 RUNNING AT aia-sdp-pvc-135536 = KILLED BY SIGNAL: 11 (Segmentation fault)

=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 1 PID 25598 RUNNING AT aia-sdp-pvc-135536 = KILLED BY SIGNAL: 11 (Segmentation fault)

Any idea what could be going wrong?

sriraman2020 commented 7 months ago

@MeouSker77 [0] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully [0] [2024-01-30 08:00:50,077] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [1] Time to load deepspeed_ccl_comm op: 0.7787277698516846 seconds [1] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully [1] [2024-01-30 08:00:50,077] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [0] [2024-01-30 08:00:50,078] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x7fc76febd1f0> [1] [2024-01-30 08:00:50,078] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x7f98be4dfd30> [0] [2024-01-30 08:00:50,078] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment... [1] [2024-01-30 08:00:50,078] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment... [1] [2024-01-30 08:00:50,706] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=1, local_rank=1, world_size=2, master_addr=10.45.77.106, master_port=29500 [0] [2024-01-30 08:00:50,706] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=2, master_addr=10.45.77.106, master_port=29500 [0] [2024-01-30 08:00:50,707] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized [0] 2024-01-30 08:00:51,486 - bigdl.llm.transformers.utils - INFO - Converting the current model to sym_int4 format...... [1] 2024-01-30 08:00:51,659 - bigdl.llm.transformers.utils - INFO - Converting the current model to sym_int4 format...... [0] 2024-01-30 08:00:52,216 - bigdl.llm.transformers.utils - INFO - BIGDL_OPT_IPEX: False [0] AutoTP: [(<class 'transformers.models.llama.modeling_llama.LlamaDecoderLayer'>, ['self_attn.o_proj', 'mlp.down_proj'])] [0] [2024-01-30 08:00:52,242] [INFO] [real_accelerator.py:166:set_accelerator] Setting ds_accelerator to xpu (model specified) [1] 2024-01-30 08:00:52,268 - bigdl.llm.transformers.utils - INFO - BIGDL_OPT_IPEX: False [1] AutoTP: [(<class 'transformers.models.llama.modeling_llama.LlamaDecoderLayer'>, ['self_attn.o_proj', 'mlp.down_proj'])] [1] [2024-01-30 08:00:52,298] [INFO] [real_accelerator.py:166:set_accelerator] Setting ds_accelerator to xpu (model specified) [0] [2024-01-30 08:00:52,504] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [0] [2024-01-30 08:00:52,504] [INFO] [comm.py:637:init_distributed] cdb=None [1] [2024-01-30 08:00:52,580] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [1] [2024-01-30 08:00:52,580] [INFO] [comm.py:637:init_distributed] cdb=None [0] [0] LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)LIBXSMM WARNING: AMX state allocation in the OS failed! [0] [0] LIBXSMM_TARGET: clx [Intel(R) Xeon(R) Platinum 8480+] [0] Registry and code: 13 MB [0] Command: python deepspeed[0] _autotp.py[0] --repo-[0] id-or-mod[0] el-path [0] /localdisk[0] /hugging[0] face/ll[0] ama2[0] [0] Uptime: 48.461769 s [1] [1] LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)LIBXSMM WARNING: AMX state allocation in the OS failed! [1] [1] LIBXSMM_TARGET: clx [Intel(R) Xeon(R) Platinum 8480+] [1] Registry and code: 13 MB [1] Command: python deepspeed_autot[1] p.py --repo-id-[1] or-model-[1] path /loca[1] ldisk/h[1] uggingfac[1] e/llama2[1] [1] Uptime: 48.521245 s

=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 0 PID 25597 RUNNING AT aia-sdp-pvc-135536 = KILLED BY SIGNAL: 11 (Segmentation fault)

=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 1 PID 25598 RUNNING AT aia-sdp-pvc-135536 = KILLED BY SIGNAL: 11 (Segmentation fault)

sriraman2020 commented 7 months ago

Incidentally on sdp system also AMX enabling is failing. So needed to disable it

plusbang commented 7 months ago

[0] LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)LIBXSMM WARNING: AMX state allocation in the OS failed! [0] LIBXSMM_TARGET: clx [Intel(R) Xeon(R) Platinum 8480+] [0] Registry and code: 13 MB

Hi, if you want to run distributed inference, please first make sure you’ve prepared environment with PyTorch2.1 following installation instruction. Then please follow the detailed README to run.

P.S. We could not reproduce your error on our machine.

sriraman2020 commented 7 months ago

Issue is resolved. Works now. Thanks for the support. But AMX is still not loading on SDP machine as well, so need to bypass AMX through export.

hkvision commented 7 months ago

Closing this issue. Feel free to tell us if you have further questions :)