hpcaitech / EnergonAI

Large-scale model inference.
Apache License 2.0
630 stars 90 forks source link

Can not start the Bloom server #191

Open SAI990323 opened 1 year ago

SAI990323 commented 1 year ago

Infomation V100 CUDA 11.3 transformers==4.23.1 torch==1.12.0 colossalai==0.2.5 energonai==0.0.1+torch1.12cu11.3 running for bloom-560m & bloom-7b1 Question When I try to start the bloom server using the examples in this link, I find it stops in this scenario. image I do not meet any errors and I can not send request to http://[ip]:[host]//generation.

cauyxy commented 1 year ago

Is there any other information? The normal startup situation should be as shown in the figure below image

baibaiw5 commented 1 year ago

I have meet the same problem.I start bloom540 with docker:hpcaitech/energon-ai:latest Infomation 4090 CUDA 11.3 transformers 4.24.0 colossalai 0.2.0+torch1.12cu11.3 energonai 0.0.1+torch1.12cu11.3 torch 1.12.1

running for bloom-560m & bloom-7b1 The application is hang.No other logs is print image

image

baibaiw5 commented 1 year ago

I have meet the same problem.I start bloom540 with docker:hpcaitech/energon-ai:latest Infomation 4090 CUDA 11.3 transformers 4.24.0 colossalai 0.2.0+torch1.12cu11.3 energonai 0.0.1+torch1.12cu11.3 torch 1.12.1

running for bloom-560m & bloom-7b1 The application is hang.No other logs is print image

image

comment random_init in run.sh .now it can be started python server.py --tp ${GPU_NUM} --name ${DATASET} --dtype "int8" --max_batch_size 4 --random_model_size "560m" #--random_init False