stuck when inferring - Githubissues

I run this script deepspeed --num_gpus 1 bloom-inference-scripts/bloom-ds-inference.py --name bigscience/bloomz-7b1 --batch_size 8 and it gets stuck just like in the picture.
Log:
(base) raihanafiandi@instance-1:~/playground/transformers-bloom-inference$ deepspeed --num_gpus 1 bloom-inference-scripts/bloom-ds-inference.py --name bigscience/bloomz-7b1 --batch_size 8
[2023-03-14 05:30:02,152] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-03-14 05:30:02,965] [INFO] [runner.py:550:main] cmd = /opt/conda/bin/python3.7 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None bloom-inference-scripts/bloom-ds-inference.py --name bigscience/bloomz-7b1 --batch_size 8
[2023-03-14 05:30:04,255] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0]}
[2023-03-14 05:30:04,255] [INFO] [launch.py:149:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-03-14 05:30:04,255] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-03-14 05:30:04,255] [INFO] [launch.py:162:main] dist_world_size=1
[2023-03-14 05:30:04,255] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0
[2023-03-14 05:30:05,816] [INFO] [comm.py:663:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
*** Loading the model bigscience/bloomz-7b1
Fetching 8 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 104857.60it/s]
Fetching 8 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 113359.57it/s]
Fetching 8 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 99864.38it/s]
Fetching 8 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 32832.13it/s]
[2023-03-14 05:30:13,610] [INFO] [logging.py:77:log_dist] [Rank 0] DeepSpeed info: version=0.8.2, git-hash=unknown, git-branch=unknown
[2023-03-14 05:30:13,611] [WARNING] [config_utils.py:77:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-03-14 05:30:13,611] [INFO] [logging.py:77:log_dist] [Rank 0] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
Installed CUDA version 11.0 does not match the version torch was compiled with 11.1 but since the APIs are compatible, accepting this combination
Using /home/raihanafiandi/.cache/torch_extensions/py37_cu111 as PyTorch extensions root...
Is it because I run other bloom models? (bloomz-7b-1)? Please help me on this. Thank you
huggingface / transformers-bloom-inference

stuck when inferring #63