microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
35.33k stars 4.1k forks source link

Error Inference via command line for CPU #4190

Open Dayananda-Akaike-Tech opened 1 year ago

Dayananda-Akaike-Tech commented 1 year ago

Describe the bug I am trying to run an inference on colab using only CPU

Using Deepspeed CMD for inference i am using the below command with my transcribe.py file and model_folder saved model folder !deepspeed --include= localhost:0 "/content/vistaar/transcribe.py" "/content/manifest.json" "/content/model_folder/" "Hindi" 1 "/content/output_path.txt".

Required Output what should i specify the parameter for device to use only cpu for inference "--include" or "--exclude" or install intel-extension-for-deepspeed

Existing output

[INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-08-22 11:30:51,120] [WARNING] [runner.py:201:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Traceback (most recent call last):
  File "/usr/local/bin/deepspeed", line 6, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/launcher/runner.py", line 420, in main
    raise RuntimeError("Unable to proceed, no GPU resources available")
RuntimeError: Unable to proceed, no GPU resources available
mrwyattii commented 1 year ago

Take a look at our CPU Inference workflow for an idea of how to setup your environment for CPU inference. For example, you will need to build and install the intel-extension-for-pytorch: https://github.com/microsoft/DeepSpeed/blob/5e16eb2c939707d0d0062a458d77998fccb3afad/.github/workflows/cpu-inference.yml#L25

Dayananda-Akaike-Tech commented 1 year ago

I installed the necessary packages and libraries from the cpu workflow yml file and tried to perform an inference again by using the command, but got an error again, kindly help me to solve the issue
!deepspeed "/content/vistaar/transcribe.py" "/content/manifest.json" "/content/model_folder/" "Hindi" 1 "/content/output_path.txt".

2023-08-23 13:00:11,110 - torch.distributed.nn.jit.instantiator - INFO - Created a temporary directory at /tmp/tmp5ptsxdl0
2023-08-23 13:00:11,111 - torch.distributed.nn.jit.instantiator - INFO - Writing /tmp/tmp5ptsxdl0/_remote_module_non_scriptable.py
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
[2023-08-23 13:00:14,466] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cpu (auto detect)
2023-08-23 13:00:18,374 - numexpr.utils - INFO - NumExpr defaulting to 2 threads.
2023-08-23 13:00:21.267462: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[2023-08-23 13:00:22,427] [WARNING] [runner.py:201:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-08-23 13:00:22,429] [INFO] [runner.py:567:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None /content/vistaar/transcribe.py /content/manifest.json /content/drive/MyDrive/Indic_Whisper (Vistar Bench_mark)/whisper-medium-hi_alldata_multigpu/ Hindi 1 /content/output_path.txt
2023-08-23 13:00:27,818 - torch.distributed.nn.jit.instantiator - INFO - Created a temporary directory at /tmp/tmphi1k5nd4
2023-08-23 13:00:27,819 - torch.distributed.nn.jit.instantiator - INFO - Writing /tmp/tmphi1k5nd4/_remote_module_non_scriptable.py
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
[2023-08-23 13:00:30,811] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cpu (auto detect)
2023-08-23 13:00:37,253 - numexpr.utils - INFO - NumExpr defaulting to 2 threads.
2023-08-23 13:00:40.022821: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[2023-08-23 13:00:41,016] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.15.5-1+cuda11.8
[2023-08-23 13:00:41,017] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.15.5-1
[2023-08-23 13:00:41,017] [INFO] [launch.py:138:main] 0 NCCL_VERSION=2.15.5-1
[2023-08-23 13:00:41,017] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
[2023-08-23 13:00:41,017] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE=libnccl2=2.15.5-1+cuda11.8
[2023-08-23 13:00:41,017] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl2
[2023-08-23 13:00:41,017] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_VERSION=2.15.5-1
[2023-08-23 13:00:41,017] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0]}
[2023-08-23 13:00:41,017] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-08-23 13:00:41,017] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-08-23 13:00:41,017] [INFO] [launch.py:163:main] dist_world_size=1
[2023-08-23 13:00:41,017] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0
2023-08-23 13:00:45.781653: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
My guessed rank = 0
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
mrwyattii commented 1 year ago

@Dayananda-Akaike-Tech what does ds_report output when you run it?

Also can you share the output of numactl --hardware? Thanks