pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.18k stars 853 forks source link

Worker died and model cannot be loaded #2102

Open 00zahra000 opened 1 year ago

00zahra000 commented 1 year ago

2023-02-01T17:45:26,287 [INFO ] W-9001-viton_1.0-stdout MODEL_LOG - File "", line 219, in _call_with_frames_removed 2023-02-01T17:45:26,286 [INFO ] epollEventLoopGroup-5-13 org.pytorch.serve.wlm.WorkerThread - 9001 Worker disconnected. WORKER_STARTED 2023-02-01T17:45:26,287 [INFO ] W-9001-viton_1.0-stdout MODEL_LOG - File "/tmp/models/9841fb5af94348d5ad9921db1b53efe4/test_handler.py", line 19, in 2023-02-01T17:45:26,287 [WARN ] W-9001-viton_1.0-stderr MODEL_LOG - warn(f"Failed to load image Python extension: {e}") 2023-02-01T17:45:26,287 [INFO ] W-9001-viton_1.0-stdout MODEL_LOG - from cloths_segmentation.pre_trained_models import create_model 2023-02-01T17:45:26,287 [DEBUG] W-9001-viton_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED 2023-02-01T17:45:26,287 [INFO ] W-9001-viton_1.0-stdout MODEL_LOG - File "/home/tookai/.local/lib/python3.8/site-packages/cloths_segmentation/pre_trained_models.py", line 6, in 2023-02-01T17:45:26,287 [INFO ] W-9001-viton_1.0-stdout MODEL_LOG - from segmentation_models_pytorch import Unet 2023-02-01T17:45:26,287 [INFO ] W-9001-viton_1.0-stdout MODEL_LOG - File "/home/tookai/.local/lib/python3.8/site-packages/segmentation_models_pytorch/init.py", line 2, in 2023-02-01T17:45:26,287 [DEBUG] W-9001-viton_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died. java.lang.InterruptedException: null at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056) ~[?:?] at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133) ~[?:?] at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432) ~[?:?] at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:189) [model-server.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?] 2023-02-01T17:45:26,287 [INFO ] W-9001-viton_1.0-stdout MODEL_LOG - from . import encoders 2023-02-01T17:45:26,288 [INFO ] W-9001-viton_1.0-stdout MODEL_LOG - File "/home/tookai/.local/lib/python3.8/site-packages/segmentation_models_pytorch/encoders/init.py", line 1, in 2023-02-01T17:45:26,288 [WARN ] W-9001-viton_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: viton, error: Worker died. 2023-02-01T17:45:26,288 [INFO ] W-9001-viton_1.0-stdout MODEL_LOG - import timm 2023-02-01T17:45:26,288 [DEBUG] W-9001-viton_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-viton_1.0 State change WORKER_STARTED -> WORKER_STOPPED 2023-02-01T17:45:26,288 [INFO ] W-9001-viton_1.0-stdout MODEL_LOG - File "/home/tookai/.local/lib/python3.8/site-packages/timm/init.py", line 2, in 2023-02-01T17:45:26,288 [INFO ] W-9001-viton_1.0-stdout MODEL_LOG - from .models import create_model, list_models, is_model, list_modules, model_entrypoint, \ 2023-02-01T17:45:26,288 [WARN ] W-9001-viton_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9001-viton_1.0-stderr 2023-02-01T17:45:26,288 [INFO ] W-9001-viton_1.0-stdout MODEL_LOG - File "/home/tookai/.local/lib/python3.8/site-packages/timm/models/init.py", line 1, in 2023-02-01T17:45:26,288 [WARN ] W-9001-viton_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9001-viton_1.0-stdout 2023-02-01T17:45:26,288 [INFO ] W-9001-viton_1.0-stdout MODEL_LOG - from .beit import * 2023-02-01T17:45:26,288 [INFO ] W-9001-viton_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9001 in 89 seconds. 2023-02-01T17:45:26,289 [INFO ] W-9001-viton_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9001-viton_1.0-stdout 2023-02-01T17:45:26,304 [INFO ] W-9001-viton_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9001-viton_1.0-stderr

We give worker died and cannot load the model. Which part of our code has problem?

namannandan commented 1 year ago

@00zahra000 looking at the log, it seems like there may be some unmet dependencies

2023-02-01T17:45:26,287 [WARN ] W-9001-viton_1.0-stderr MODEL_LOG - warn(f"Failed to load image Python extension: {e}")

All the dependencies required to run the custom handler will need to be included with the model archive or specified in the model archive custom python requirements file.