intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.75k stars 1.27k forks source link

Can bigdl-llm support deepspeed on 2 Arc DGPU? #9231

Open biyuehuang opened 1 year ago

biyuehuang commented 1 year ago

hello, can bigdl-llm support distributed inference with Deepspeed on 2 Arc DGPU? $ sycl-ls [opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2023.16.6.0.22_223734] [opencl:cpu:1] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 5420+ 3.0 [2023.16.6.0.22_223734] [opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics 3.0 [23.26.26690.36] [opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics 3.0 [23.26.26690.36] [ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918] [ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918]

jason-dai commented 1 year ago

It's current WIP: https://github.com/intel-analytics/BigDL/pull/9230

KiwiHana commented 1 year ago

hi , I notice the PR has been merged. Can I try by bigdl20231025? Do you have bigdl with deepspeed installation guide?

https://github.com/intel-analytics/BigDL/pull/9289

KiwiHana commented 1 year ago

https://github.com/intel-analytics/BigDL/tree/a96f3053cdf5e50456913b3001cefe312e3e7eb0/python/llm/example/GPU/Deepspeed-AutoTP

bigdl-core-xe 2.4.0b20231026 bigdl-core-xe-esimd 2.4.0b20231026 bigdl-llm 2.4.0b20231026 intel-extension-for-pytorch 2.0.110+xpu

$ ./run.sh
found intel-openmp in /home/adc-a770/miniconda3/envs/llm-test/lib/libiomp5.so
found oneapi in /opt/intel/oneapi/setvars.sh

:: initializing oneAPI environment ...
   run.sh: BASH_VERSION = 5.1.16(1)-release
   args: Using "$@" for setvars.sh arguments:
:: ccl -- latest
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpcpp-ct -- latest
:: dpl -- latest
:: mkl -- latest
:: mpi -- latest
:: tbb -- latest
:: oneAPI environment initialized ::

+++++ Env Variables +++++
LD_PRELOAD            = /home/adc-a770/miniconda3/envs/llm-test/lib/libiomp5.so
OMP_NUM_THREADS       = 28
USE_XETLA             = OFF
ENABLE_SDP_FUSION     = 1
SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS = 1
+++++++++++++++++++++++++
Complete.
master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
My guessed rank = 2
My guessed rank = 1
My guessed rank = 0
My guessed rank = 3
Traceback (most recent call last):
  File "/home/adc-a770/llm/bigdl/deepspeed/deepspeed_autotp.py", line 20, in <module>
    import deepspeed
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/__init__.py", line 21, in <module>
    from . import ops
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/ops/__init__.py", line 6, in <module>
    from . import adam
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/ops/adam/__init__.py", line 6, in <module>
    from .cpu_adam import DeepSpeedCPUAdam
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 8, in <module>
    from deepspeed.utils import logger
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/utils/__init__.py", line 10, in <module>
    from .groups import *
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/utils/groups.py", line 28, in <module>
    from deepspeed import comm as dist
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/comm/__init__.py", line 7, in <module>
    from .comm import *
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/comm/comm.py", line 34, in <module>
    from deepspeed.utils import timer, get_caller_func
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/utils/timer.py", line 31, in <module>
    class CudaEventTimer(object):
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/utils/timer.py", line 33, in CudaEventTimer
    def __init__(self, start_event: get_accelerator().Event, end_event: get_accelerator().Event):
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/accelerator/real_accelerator.py", line 142, in get_accelerator
    from .cpu_accelerator import CPU_Accelerator
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/accelerator/cpu_accelerator.py", line 8, in <module>
    import oneccl_bindings_for_pytorch  # noqa: F401 # type: ignore
ModuleNotFoundError: No module named 'oneccl_bindings_for_pytorch'
Traceback (most recent call last):
  File "/home/adc-a770/llm/bigdl/deepspeed/deepspeed_autotp.py", line 20, in <module>
    import deepspeed
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/__init__.py", line 21, in <module>
    from . import ops
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/ops/__init__.py", line 6, in <module>
    from . import adam
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/ops/adam/__init__.py", line 6, in <module>
    from .cpu_adam import DeepSpeedCPUAdam
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 8, in <module>
    from deepspeed.utils import logger
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/utils/__init__.py", line 10, in <module>
    from .groups import *
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/utils/groups.py", line 28, in <module>
    from deepspeed import comm as dist
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/comm/__init__.py", line 7, in <module>
    from .comm import *
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/comm/comm.py", line 34, in <module>
    from deepspeed.utils import timer, get_caller_func
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/utils/timer.py", line 31, in <module>
    class CudaEventTimer(object):
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/utils/timer.py", line 33, in CudaEventTimer
    def __init__(self, start_event: get_accelerator().Event, end_event: get_accelerator().Event):
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/accelerator/real_accelerator.py", line 142, in get_accelerator
    from .cpu_accelerator import CPU_Accelerator
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/accelerator/cpu_accelerator.py", line 8, in <module>
    import oneccl_bindings_for_pytorch  # noqa: F401 # type: ignore
ModuleNotFoundError: No module named 'oneccl_bindings_for_pytorch'
Traceback (most recent call last):
  File "/home/adc-a770/llm/bigdl/deepspeed/deepspeed_autotp.py", line 20, in <module>
    import deepspeed
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/__init__.py", line 21, in <module>
    from . import ops
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/ops/__init__.py", line 6, in <module>
    from . import adam
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/ops/adam/__init__.py", line 6, in <module>
    from .cpu_adam import DeepSpeedCPUAdam
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 8, in <module>
    from deepspeed.utils import logger
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/utils/__init__.py", line 10, in <module>
    from .groups import *
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/utils/groups.py", line 28, in <module>
    from deepspeed import comm as dist
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/comm/__init__.py", line 7, in <module>
    from .comm import *
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/comm/comm.py", line 34, in <module>
    from deepspeed.utils import timer, get_caller_func
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/utils/timer.py", line 31, in <module>
    class CudaEventTimer(object):
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/utils/timer.py", line 33, in CudaEventTimer
    def __init__(self, start_event: get_accelerator().Event, end_event: get_accelerator().Event):
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/accelerator/real_accelerator.py", line 142, in get_accelerator
    from .cpu_accelerator import CPU_Accelerator
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/accelerator/cpu_accelerator.py", line 8, in <module>
    import oneccl_bindings_for_pytorch  # noqa: F401 # type: ignore
ModuleNotFoundError: No module named 'oneccl_bindings_for_pytorch'
Traceback (most recent call last):
  File "/home/adc-a770/llm/bigdl/deepspeed/deepspeed_autotp.py", line 20, in <module>
    import deepspeed
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/__init__.py", line 21, in <module>
    from . import ops
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/ops/__init__.py", line 6, in <module>
    from . import adam
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/ops/adam/__init__.py", line 6, in <module>
    from .cpu_adam import DeepSpeedCPUAdam
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 8, in <module>
    from deepspeed.utils import logger
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/utils/__init__.py", line 10, in <module>
    from .groups import *
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/utils/groups.py", line 28, in <module>
    from deepspeed import comm as dist
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/comm/__init__.py", line 7, in <module>
    from .comm import *
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/comm/comm.py", line 34, in <module>
    from deepspeed.utils import timer, get_caller_func
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/utils/timer.py", line 31, in <module>
    class CudaEventTimer(object):
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/utils/timer.py", line 33, in CudaEventTimer
    def __init__(self, start_event: get_accelerator().Event, end_event: get_accelerator().Event):
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/accelerator/real_accelerator.py", line 142, in get_accelerator
    from .cpu_accelerator import CPU_Accelerator
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/deepspeed/accelerator/cpu_accelerator.py", line 8, in <module>
    import oneccl_bindings_for_pytorch  # noqa: F401 # type: ignore
ModuleNotFoundError: No module named 'oneccl_bindings_for_pytorch'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 58011) of binary: /home/adc-a770/miniconda3/envs/llm-test/bin/python
Traceback (most recent call last):
  File "/home/adc-a770/miniconda3/envs/llm-test/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/adc-a770/miniconda3/envs/llm-test/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
deepspeed_autotp.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2023-10-27_09:34:30
  host      : adc-a770-0
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 58012)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2023-10-27_09:34:30
  host      : adc-a770-0
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 58013)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
  time      : 2023-10-27_09:34:30
  host      : adc-a770-0
  rank      : 3 (local_rank: 3)
  exitcode  : 1 (pid: 58014)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-10-27_09:34:30
  host      : adc-a770-0
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 58011)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
yangw1234 commented 1 year ago

Hi, it seems that you did not install the necessary package. Would you mind checking the following package is correctly installed?

See https://github.com/intel-analytics/BigDL/pull/9289/files#diff-7698ea41f124a8b3313c530f9636378facdb3e73f9cc6b3f6e59eb0b5fc5d143R17-R21

KiwiHana commented 1 year ago

@yangw1234 Hi Yang, I have installed all pull/9289 package, but still have Error on 2 Arc

$ cat run.sh
source bigdl-llm-init -t -g
export MASTER_ADDR=127.0.0.1
export CCL_ZE_IPC_EXCHANGE=sockets
NUM_GPUS=2
if [[ -n $OMP_NUM_THREADS ]]; then
    export OMP_NUM_THREADS=$(($OMP_NUM_THREADS / $NUM_GPUS))
else
    export OMP_NUM_THREADS=$(($(nproc) / $NUM_GPUS))
fi
torchrun --standalone \
         --nnodes=1 \
         --nproc-per-node $NUM_GPUS \
         deepspeed_autotp.py --repo-id-or-model-path "/home/adc-a770/data/Llama-2-7b-chat-hf"

Error Log:

$ ./run.sh
found oneapi in /opt/intel/oneapi/setvars.sh

:: WARNING: setvars.sh has already been run. Skipping re-execution.
   To force a re-execution of setvars.sh, use the '--force' option.
   Using '--force' can result in excessive use of your environment variables.

usage: source setvars.sh [--force] [--config=file] [--help] [...]
  --force        Force setvars.sh to re-run, doing so may overload environment.
  --config=file  Customize env vars using a setvars.sh configuration file.
  --help         Display this help message and exit.
  ...            Additional args are passed to individual env/vars.sh scripts
                 and should follow this script's arguments.

  Some POSIX shells do not accept command-line options. In that case, you can pass
  command-line options via the SETVARS_ARGS environment variable. For example:

  $ SETVARS_ARGS="ia32 --config=config.txt" ; export SETVARS_ARGS
  $ . path/to/setvars.sh

  The SETVARS_ARGS environment variable is cleared on exiting setvars.sh.

+++++ Env Variables +++++
LD_PRELOAD            =
OMP_NUM_THREADS       =
USE_XETLA             = OFF
ENABLE_SDP_FUSION     = 1
SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS = 1
+++++++++++++++++++++++++
Complete.
master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
My guessed rank = 0
My guessed rank = 1
[2023-10-27 16:16:26,467] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cpu (auto detect)
[2023-10-27 16:16:26,479] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cpu (auto detect)
Loading checkpoint shards: 100%|██████████████████████████████████████████████████| 2/2 [00:00<00:00, 18.90it/s]
[2023-10-27 16:16:27,507] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.11.2+78c518ed, git-hash=78c518ed, git-branch=HEAD
[2023-10-27 16:16:27,507] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference
[2023-10-27 16:16:27,507] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-10-27 16:16:27,508] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
Using /home/adc-a770/.cache/torch_extensions/py39_cpu as PyTorch extensions root...
Emitting ninja build file /home/adc-a770/.cache/torch_extensions/py39_cpu/deepspeed_ccl_comm/build.ninja...
Building extension module deepspeed_ccl_comm...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module deepspeed_ccl_comm...
Time to load deepspeed_ccl_comm op: 0.08489537239074707 seconds
DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully
Loading checkpoint shards: 100%|██████████████████████████████████████████████████| 2/2 [00:00<00:00, 18.25it/s]
[2023-10-27 16:16:27,721] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.11.2+78c518ed, git-hash=78c518ed, git-branch=HEAD
[2023-10-27 16:16:27,721] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference
[2023-10-27 16:16:27,722] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-10-27 16:16:27,722] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
Using /home/adc-a770/.cache/torch_extensions/py39_cpu as PyTorch extensions root...
Emitting ninja build file /home/adc-a770/.cache/torch_extensions/py39_cpu/deepspeed_ccl_comm/build.ninja...
Building extension module deepspeed_ccl_comm...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module deepspeed_ccl_comm...
Time to load deepspeed_ccl_comm op: 0.1180427074432373 seconds
DeepSpeed deepspeed.ops.comm.deepspeed_ccl_comm_op built successfully
2023-10-27 16:16:28,598 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 1
2023-10-27 16:16:28,606 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 0
2023-10-27 16:16:28,606 - torch.distributed.distributed_c10d - INFO - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
2023:10:27-16:16:28:(126995) |CCL_WARN| did not find MPI-launcher specific variables, switch to ATL/OFI, to force enable ATL/MPI set CCL_ATL_TRANSPORT=mpi
2023:10:27-16:16:28:(126995) |CCL_WARN| could not get local_idx/count from environment variables, trying to get them from ATL
2023-10-27 16:16:28,608 - torch.distributed.distributed_c10d - INFO - Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
2023:10:27-16:16:28:(126995) |CCL_WARN| sockets exchange mode is set. It may cause potential problem of 'Too many open file descriptors'
2023:10:27-16:16:28:(126996) |CCL_WARN| did not find MPI-launcher specific variables, switch to ATL/OFI, to force enable ATL/MPI set CCL_ATL_TRANSPORT=mpi
2023:10:27-16:16:28:(126996) |CCL_WARN| could not get local_idx/count from environment variables, trying to get them from ATL
2023:10:27-16:16:28:(126996) |CCL_WARN| sockets exchange mode is set. It may cause potential problem of 'Too many open file descriptors'
[2023-10-27 16:16:29,588] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2023-10-27 16:16:29,588] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2023-10-27 16:16:29,588] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x7fd32bafe910>
[2023-10-27 16:16:29,588] [INFO] [comm.py:637:init_distributed] cdb=<deepspeed.comm.ccl.CCLBackend object at 0x7f2bc92bf9d0>
[2023-10-27 16:16:29,589] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized
2023-10-27 16:16:29,590 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:2 to store for rank: 0
2023-10-27 16:16:29,590 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:2 to store for rank: 1
2023-10-27 16:16:29,591 - torch.distributed.distributed_c10d - INFO - Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes.
2023-10-27 16:16:29,591 - torch.distributed.distributed_c10d - INFO - Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes.
AutoTP:  AutoTP: [(<class 'transformers.models.llama.modeling_llama.LlamaDecoderLayer'>, ['mlp.down_proj', 'self_attn.o_proj'])]
[(<class 'transformers.models.llama.modeling_llama.LlamaDecoderLayer'>, ['mlp.down_proj', 'self_attn.o_proj'])]
2023-10-27 16:16:30,052 - bigdl.llm.transformers.utils - INFO - Converting the current model to sym_int4 format......
2023-10-27 16:16:30,056 - bigdl.llm.transformers.utils - INFO - Converting the current model to sym_int4 format......
LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): LowBitLinear(in_features=4096, out_features=2048, bias=False)
          (k_proj): LowBitLinear(in_features=4096, out_features=2048, bias=False)
          (v_proj): LowBitLinear(in_features=4096, out_features=2048, bias=False)
          (o_proj): LowBitLinear(in_features=2048, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): LowBitLinear(in_features=4096, out_features=5504, bias=False)
          (up_proj): LowBitLinear(in_features=4096, out_features=5504, bias=False)
          (down_proj): LowBitLinear(in_features=5504, out_features=4096, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): LowBitLinear(in_features=4096, out_features=32000, bias=False)
)
/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/transformers/generation/utils.py:1270: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
  warnings.warn(
LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): LowBitLinear(in_features=4096, out_features=2048, bias=False)
          (k_proj): LowBitLinear(in_features=4096, out_features=2048, bias=False)
          (v_proj): LowBitLinear(in_features=4096, out_features=2048, bias=False)
          (o_proj): LowBitLinear(in_features=2048, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): LowBitLinear(in_features=4096, out_features=5504, bias=False)
          (up_proj): LowBitLinear(in_features=4096, out_features=5504, bias=False)
          (down_proj): LowBitLinear(in_features=5504, out_features=4096, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): LowBitLinear(in_features=4096, out_features=32000, bias=False)
)
/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/transformers/generation/utils.py:1270: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
  warnings.warn(
Traceback (most recent call last):
  File "/home/adc-a770/llm/bigdl/deepspeed/deepspeed_autotp.py", line 81, in <module>
    output = model.generate(input_ids,
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/transformers/generation/utils.py", line 1538, in generate
    return self.greedy_search(
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/transformers/generation/utils.py", line 2362, in greedy_search
    outputs = self(
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 806, in forward
    outputs = self.model(
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 693, in forward
    layer_outputs = decoder_layer(
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/bigdl/llm/transformers/models/llama.py", line 126, in llama_attention_forward_4_31
    query_states = self.q_proj(hidden_states)
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/bigdl/llm/transformers/low_bit_linear.py", line 375, in forward
    result = linear_q4_0.forward_new(x_2d, self.weight.data, self.weight.qtype,
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, xpu:1 and xpu:0! (when checking argument for argument mat2 in method wrapper_XPU__mm)
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 126995 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -11) local_rank: 1 (pid: 126996) of binary: /home/adc-a770/miniconda3/envs/bigdl-deepspeed/bin/python
Traceback (most recent call last):
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
========================================================
deepspeed_autotp.py FAILED
--------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
--------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-10-27_16:16:39
  host      : adc-a770-0
  rank      : 1 (local_rank: 1)
  exitcode  : -11 (pid: 126996)
  error_file: <N/A>
  traceback : Signal 11 (SIGSEGV) received by PID 126996
========================================================
biyuehuang commented 1 year ago

bigdl20231029


$ pip list
Package                       Version
----------------------------- ------------------
accelerate                    0.21.0
annotated-types               0.6.0
bigdl-core-xe                 2.4.0b20231029
bigdl-core-xe-esimd           2.4.0b20231029
bigdl-llm                     2.4.0b20231029
certifi                       2023.7.22
charset-normalizer            3.3.1
deepspeed                     0.11.2+78c518ed
filelock                      3.12.4
fsspec                        2023.10.0
hjson                         3.1.0
huggingface-hub               0.18.0
idna                          3.4
intel-extension-for-deepspeed 0.9.4+ec33277
intel-extension-for-pytorch   2.0.110+xpu
Jinja2                        3.1.2
MarkupSafe                    2.1.3
mpi4py                        3.1.5
mpmath                        1.3.0
networkx                      3.2
ninja                         1.11.1.1
numpy                         1.26.1
oneccl-bind-pt                2.0.100+gpu
packaging                     23.2
Pillow                        10.1.0
pip                           23.3
protobuf                      4.25.0rc2
psutil                        5.9.6
py-cpuinfo                    9.0.0
pydantic                      2.4.2
pydantic_core                 2.10.1
PyYAML                        6.0.1
regex                         2023.10.3
requests                      2.31.0
safetensors                   0.4.0
sentencepiece                 0.1.99
setuptools                    68.0.0
sympy                         1.12
tabulate                      0.9.0
tokenizers                    0.13.3
torch                         2.0.1a0+cxx11.abi
torchvision                   0.15.2a0+cxx11.abi
tqdm                          4.66.1
transformers                  4.31.0
typing_extensions             4.8.0
urllib3                       2.0.7
wheel                         0.41.2
$ ./run.sh
found oneapi in /opt/intel/oneapi/setvars.sh

:: WARNING: setvars.sh has already been run. Skipping re-execution.
   To force a re-execution of setvars.sh, use the '--force' option.
   Using '--force' can result in excessive use of your environment variables.

usage: source setvars.sh [--force] [--config=file] [--help] [...]
  --force        Force setvars.sh to re-run, doing so may overload environment.
  --config=file  Customize env vars using a setvars.sh configuration file.
  --help         Display this help message and exit.
  ...            Additional args are passed to individual env/vars.sh scripts
                 and should follow this script's arguments.

  Some POSIX shells do not accept command-line options. In that case, you can pass
  command-line options via the SETVARS_ARGS environment variable. For example:

  $ SETVARS_ARGS="ia32 --config=config.txt" ; export SETVARS_ARGS
  $ . path/to/setvars.sh

  The SETVARS_ARGS environment variable is cleared on exiting setvars.sh.

+++++ Env Variables +++++
LD_PRELOAD            =
OMP_NUM_THREADS       =
USE_XETLA             = OFF
ENABLE_SDP_FUSION     = 1
SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS = 1
+++++++++++++++++++++++++
Complete.
master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
My guessed rank = 1
My guessed rank = 0
[2023-10-30 17:05:15,386] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to xpu (auto detect)
[2023-10-30 17:05:15,567] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to xpu (auto detect)
Loading checkpoint shards: 100%|██████████████████████████████████████████████████| 2/2 [00:00<00:00, 17.35it/s]
[2023-10-30 17:05:16,422] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.11.2+78c518ed, git-hash=78c518ed, git-branch=HEAD
[2023-10-30 17:05:16,423] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference
[2023-10-30 17:05:16,423] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-10-30 17:05:16,423] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2023-10-30 17:05:16,425] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2023-10-30 17:05:16,425] [INFO] [comm.py:637:init_distributed] cdb=None
Loading checkpoint shards: 100%|██████████████████████████████████████████████████| 2/2 [00:00<00:00, 20.19it/s]
[2023-10-30 17:05:16,590] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.11.2+78c518ed, git-hash=78c518ed, git-branch=HEAD
[2023-10-30 17:05:16,591] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference
[2023-10-30 17:05:16,591] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-10-30 17:05:16,591] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2023-10-30 17:05:16,592] [INFO] [comm.py:161:init_deepspeed_backend] Initialize ccl backend
[2023-10-30 17:05:16,592] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-10-30 17:05:16,592] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend ccl
2023-10-30 17:05:17,427 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 1
2023-10-30 17:05:17,428 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 0
2023-10-30 17:05:17,428 - torch.distributed.distributed_c10d - INFO - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
2023-10-30 17:05:17,429 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:2 to store for rank: 0
2023-10-30 17:05:17,437 - torch.distributed.distributed_c10d - INFO - Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
2023-10-30 17:05:17,438 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:2 to store for rank: 1
2023-10-30 17:05:17,438 - torch.distributed.distributed_c10d - INFO - Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes.
2023-10-30 17:05:17,440 - torch.distributed.distributed_c10d - INFO - Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes.
AutoTP:  [(<class 'transformers.models.llama.modeling_llama.LlamaDecoderLayer'>, ['mlp.down_proj', 'self_attn.o_proj'])]
AutoTP:  [(<class 'transformers.models.llama.modeling_llama.LlamaDecoderLayer'>, ['self_attn.o_proj', 'mlp.down_proj'])]
2023:10:30-17:05:20:(1498874) |CCL_WARN| did not find MPI-launcher specific variables, switch to ATL/OFI, to force enable ATL/MPI set CCL_ATL_TRANSPORT=mpi
2023:10:30-17:05:20:(1498874) |CCL_WARN| could not get local_idx/count from environment variables, trying to get them from ATL
2023:10:30-17:05:20:(1498874) |CCL_WARN| sockets exchange mode is set. It may cause potential problem of 'Too many open file descriptors'
2023:10:30-17:05:20:(1498875) |CCL_WARN| did not find MPI-launcher specific variables, switch to ATL/OFI, to force enable ATL/MPI set CCL_ATL_TRANSPORT=mpi
2023:10:30-17:05:20:(1498875) |CCL_WARN| could not get local_idx/count from environment variables, trying to get them from ATL
2023:10:30-17:05:20:(1498875) |CCL_WARN| sockets exchange mode is set. It may cause potential problem of 'Too many open file descriptors'
2023-10-30 17:05:22,876 - bigdl.llm.transformers.utils - INFO - Converting the current model to sym_int4 format......
2023-10-30 17:05:23,234 - bigdl.llm.transformers.utils - INFO - Converting the current model to sym_int4 format......
LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): LowBitLinear(in_features=4096, out_features=2048, bias=False)
          (k_proj): LowBitLinear(in_features=4096, out_features=2048, bias=False)
          (v_proj): LowBitLinear(in_features=4096, out_features=2048, bias=False)
          (o_proj): LowBitLinear(in_features=2048, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): LowBitLinear(in_features=4096, out_features=5504, bias=False)
          (up_proj): LowBitLinear(in_features=4096, out_features=5504, bias=False)
          (down_proj): LowBitLinear(in_features=5504, out_features=4096, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): LowBitLinear(in_features=4096, out_features=32000, bias=False)
)
/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/transformers/generation/utils.py:1270: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
  warnings.warn(
LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): LowBitLinear(in_features=4096, out_features=2048, bias=False)
          (k_proj): LowBitLinear(in_features=4096, out_features=2048, bias=False)
          (v_proj): LowBitLinear(in_features=4096, out_features=2048, bias=False)
          (o_proj): LowBitLinear(in_features=2048, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): LowBitLinear(in_features=4096, out_features=5504, bias=False)
          (up_proj): LowBitLinear(in_features=4096, out_features=5504, bias=False)
          (down_proj): LowBitLinear(in_features=5504, out_features=4096, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): LowBitLinear(in_features=4096, out_features=32000, bias=False)
)
/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/transformers/generation/utils.py:1270: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
  warnings.warn(
adc-a770-0:pid1498874.python: Reading from remote process' memory failed. Disabling CMA support
adc-a770-0:pid1498875.python: Reading from remote process' memory failed. Disabling CMA support
adc-a770-0:pid1498874: Assertion failure at psm3/ptl_am/ptl.c:195: nbytes == req->req_data.recv_msglen
adc-a770-0:pid1498875: Assertion failure at psm3/ptl_am/ptl.c:195: nbytes == req->req_data.recv_msglen
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 1498874) of binary: /home/adc-a770/miniconda3/envs/bigdl-deepspeed/bin/python
Traceback (most recent call last):
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/adc-a770/miniconda3/envs/bigdl-deepspeed/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
========================================================
deepspeed_autotp.py FAILED
--------------------------------------------------------
Failures:
[1]:
  time      : 2023-10-30_17:05:33
  host      : adc-a770-0
  rank      : 1 (local_rank: 1)
  exitcode  : -6 (pid: 1498875)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 1498875
--------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-10-30_17:05:33
  host      : adc-a770-0
  rank      : 0 (local_rank: 0)
  exitcode  : -6 (pid: 1498874)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 1498874
========================================================
yangw1234 commented 1 year ago

I did not reproduce the issue. here is my configuration:

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2023.16.7.0.21_160000]
[opencl:cpu:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i9-13900K 3.0 [2023.16.7.0.21_160000]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics 3.0 [23.17.26241.33]
[opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics 3.0 [23.17.26241.33]
[opencl:gpu:4] Intel(R) OpenCL Graphics, Intel(R) UHD Graphics 770 3.0 [23.17.26241.33]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:2] Intel(R) Level-Zero, Intel(R) UHD Graphics 770 1.3 [1.3.26241]

bigdl-core-xe                 2.4.0b20231030
bigdl-core-xe-esimd           2.4.0b20231030
bigdl-llm                     2.4.0b20231030
biyuehuang commented 1 year ago

I did not reproduce the issue. here is my configuration:

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2023.16.7.0.21_160000]
[opencl:cpu:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i9-13900K 3.0 [2023.16.7.0.21_160000]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics 3.0 [23.17.26241.33]
[opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics 3.0 [23.17.26241.33]
[opencl:gpu:4] Intel(R) OpenCL Graphics, Intel(R) UHD Graphics 770 3.0 [23.17.26241.33]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:2] Intel(R) Level-Zero, Intel(R) UHD Graphics 770 1.3 [1.3.26241]

bigdl-core-xe                 2.4.0b20231030
bigdl-core-xe-esimd           2.4.0b20231030
bigdl-llm                     2.4.0b20231030

The same Error between bigdl20231029 and 20231030 :torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 1758856) of binary: /home/adc-a770/miniconda3/envs/bigdl-deepspeed/bin/python