Closed sriraman2020 closed 7 months ago
@jenniew please take a look
@sriraman2020 I tried this on my pvc, and didn't reproduce this issue. I think this may be your environment issue. Can you run source /opt/intel/oneapi/setvars.sh
before running this example?
@jenniew
Hi @sriraman2020
We again confirmed in our environment, we can't reproduce this issue. Somehow AMX fails to get enabled on your SPR, wondering is there anything special to your SPR?
As a workaround, you may use export BIGDL_LLM_AMX_DISABLED=1
to disable AMX to see if you can proceed.
@hkvision i tried with the above option. it moves ahead. But stuck with next error ImportError: libsycl.so.6: cannot open shared object file: No such file or directory
Check this page for your error: https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#cannot-open-shared-object-file-no-such-file-or-directory This should happen if bigdl/ipex and oneapi versions mismatch, can you confirm this?
@hkvision
Actually Torch version 2.1.a0 oneapi version is 2024.0
So it seems to tally
ImportError: libsycl.so.6: cannot open shared object file: No such file or directory
please check intel_extension_for_pytorch
version: pip list | grep intel_extension_for_pytorch
libsycl.so.6
is included in oneapi 2023.2, and it is required by intel_extension_for_pytorch 2.0
intel_extension_for_pytorch 2.1
requires libsycl.so.7
, which is included in oneapi 2024.0
If you are using intel_extension_for_pytorch 2.0
, please update it to 2.1: pip install --pre --upgrade --force-reinstall bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
@MeouSker77 intel_extension_for_pytorch version is 2.1.10+xpu
Is that error happened when import linear_q4_0
?
If is, try
pip uninstall bigdl-core-xe
pip uninstall bigdl-core-xe-21
pip install --pre --upgrade bigdl-core-xe-21
@MeouSker77 Cool, That Worked!! Thanks!
@MeouSker77 does it work for multi GPU ? We have currently tried it on a single PVC. 1T system
@MeouSker77 does it work for multi GPU ? We have currently tried it on a single PVC. 1T system
Yes - see distributed inference and finetuning examples.
@MeouSker77 i tried inference on multi PVC system. bash run_llama2_70b_pvc_1550_1_card.sh GEt this error [1] Uptime: 48.521245 s
Any idea what could be going wrong?
@MeouSker77 [0] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully [0] [2024-01-30 08:00:50,077] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [1] Time to load deepspeed_ccl_comm op: 0.7787277698516846 seconds [1] DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully [1] [2024-01-30 08:00:50,077] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [0] [2024-01-30 08:00:50,078] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x7fc76febd1f0> [1] [2024-01-30 08:00:50,078] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x7f98be4dfd30> [0] [2024-01-30 08:00:50,078] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment... [1] [2024-01-30 08:00:50,078] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment... [1] [2024-01-30 08:00:50,706] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=1, local_rank=1, world_size=2, master_addr=10.45.77.106, master_port=29500 [0] [2024-01-30 08:00:50,706] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=2, master_addr=10.45.77.106, master_port=29500 [0] [2024-01-30 08:00:50,707] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized [0] 2024-01-30 08:00:51,486 - bigdl.llm.transformers.utils - INFO - Converting the current model to sym_int4 format...... [1] 2024-01-30 08:00:51,659 - bigdl.llm.transformers.utils - INFO - Converting the current model to sym_int4 format...... [0] 2024-01-30 08:00:52,216 - bigdl.llm.transformers.utils - INFO - BIGDL_OPT_IPEX: False [0] AutoTP: [(<class 'transformers.models.llama.modeling_llama.LlamaDecoderLayer'>, ['self_attn.o_proj', 'mlp.down_proj'])] [0] [2024-01-30 08:00:52,242] [INFO] [real_accelerator.py:166:set_accelerator] Setting ds_accelerator to xpu (model specified) [1] 2024-01-30 08:00:52,268 - bigdl.llm.transformers.utils - INFO - BIGDL_OPT_IPEX: False [1] AutoTP: [(<class 'transformers.models.llama.modeling_llama.LlamaDecoderLayer'>, ['self_attn.o_proj', 'mlp.down_proj'])] [1] [2024-01-30 08:00:52,298] [INFO] [real_accelerator.py:166:set_accelerator] Setting ds_accelerator to xpu (model specified) [0] [2024-01-30 08:00:52,504] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [0] [2024-01-30 08:00:52,504] [INFO] [comm.py:637:init_distributed] cdb=None [1] [2024-01-30 08:00:52,580] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend [1] [2024-01-30 08:00:52,580] [INFO] [comm.py:637:init_distributed] cdb=None [0] [0] LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)LIBXSMM WARNING: AMX state allocation in the OS failed! [0] [0] LIBXSMM_TARGET: clx [Intel(R) Xeon(R) Platinum 8480+] [0] Registry and code: 13 MB [0] Command: python deepspeed[0] _autotp.py[0] --repo-[0] id-or-mod[0] el-path [0] /localdisk[0] /hugging[0] face/ll[0] ama2[0] [0] Uptime: 48.461769 s [1] [1] LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)LIBXSMM WARNING: AMX state allocation in the OS failed! [1] [1] LIBXSMM_TARGET: clx [Intel(R) Xeon(R) Platinum 8480+] [1] Registry and code: 13 MB [1] Command: python deepspeed_autot[1] p.py --repo-id-[1] or-model-[1] path /loca[1] ldisk/h[1] uggingfac[1] e/llama2[1] [1] Uptime: 48.521245 s
Incidentally on sdp system also AMX enabling is failing. So needed to disable it
[0] LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)LIBXSMM WARNING: AMX state allocation in the OS failed! [0] LIBXSMM_TARGET: clx [Intel(R) Xeon(R) Platinum 8480+] [0] Registry and code: 13 MB
Hi, if you want to run distributed inference, please first make sure you’ve prepared environment with PyTorch2.1 following installation instruction. Then please follow the detailed README to run.
P.S. We could not reproduce your error on our machine.
Issue is resolved. Works now. Thanks for the support. But AMX is still not loading on SDP machine as well, so need to bypass AMX through export.
Closing this issue. Feel free to tell us if you have further questions :)
Trying out Mixtral on PVC-1Card-1Tile with recipe from https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral
command line: python ./generate.py --prompt 'what is AI?' --repo-id-or -model-path mistralai/Mixtral-8x7B-v0.1
Get this error "Failed to enable AMX" Any suggestions pls.