torchserve bloom7b1 demo Load model failed

zqc2011hy commented 1 week ago

🐛 Describe the bug

2024-06-22T03:41:52,860 [ERROR] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 2 2024-06-22T03:41:52,861 [ERROR] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:242) [model-server.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:833) [?:?] 2024-06-22T03:41:52,863 [WARN ] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: bloom7b1, error: Worker died. 2024-06-22T03:41:52,863 [DEBUG] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-bloom7b1_1.0 State change WORKER_STARTED -> WORKER_STOPPED 2024-06-22T03:41:52,863 [WARN ] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again 2024-06-22T03:41:52,864 [INFO ] epollEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STOPPED

Error logs

2024-06-22T03:41:52,860 [ERROR] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 2 2024-06-22T03:41:52,861 [ERROR] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:242) [model-server.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:833) [?:?] 2024-06-22T03:41:52,863 [WARN ] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: bloom7b1, error: Worker died. 2024-06-22T03:41:52,863 [DEBUG] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-bloom7b1_1.0 State change WORKER_STARTED -> WORKER_STOPPED 2024-06-22T03:41:52,863 [WARN ] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again 2024-06-22T03:41:52,864 [INFO ] epollEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STOPPED

Installation instructions

https://kserve.github.io/website/latest/modelserving/v1beta1/llm/torchserve/accelerate/

Model Packaging

gs://kfserving-examples/models/torchserve/llm/Huggingface_accelerate/bloom

config.properties

No response

Versions

torchserve --start --model-store=/mnt/models/model-store --ts-config=/mnt/models/config/config.properties

Repro instructions

gs://kfserving-examples/models/torchserve/llm/Huggingface_accelerate/bloom

Possible Solution

No response

agunapal commented 6 days ago

Hi @zqc2011hy Looking at this log : Backend worker did not respond in given time, it seems you need to increase the default_response_timeout value in config.properties. This value would changes depending on the hardware you are using.

zqc2011hy commented 6 days ago

SERVICE_HOSTNAME=$(kubectl get inferenceservice bloom7b1 -o jsonpath='{.status.url}' | cut -d "/" -f 3)

curl -v \ -H "Host: ${SERVICE_HOSTNAME}" \ -H "Content-Type: application/json" \ -d @./text.json \ http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/bloom7b1:predict

{"predictions":["My dog is cute.\nNice.\n- Hey, Mom.\n- Yeah?\nWhat color's your dog?\n- It's gray.\n- Gray?\nYeah.\nIt looks gray to me.\n- Where'd you get it?\n- Well, Dad says it's kind of...\n- Gray?\n- Gray.\nYou got a gray dog?\n- It's gray.\n- Gray.\nIs your dog gray?\nAre you sure?\nNo.\nYou sure"]}

Please provide the specific parameters of text.json

pytorch / serve