pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.14k stars 835 forks source link

java.io.IOException: Cannot run program "/home/venv/bin/python" when running pytorch/torchserve image #2504

Open lz-chen opened 1 year ago

lz-chen commented 1 year ago

🐛 Describe the bug

I'm following this tutorial but the docker run didn't start properly

Error logs

WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested This command is not for general use and should only be run as the result of a call to ProcessBuilder.start() or Runtime.exec() in a java application WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. 2023-07-28T12:40:14,354 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager... 2023-07-28T12:40:14,658 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml 2023-07-28T12:40:15,063 [INFO ] main org.pytorch.serve.ModelServer - Torchserve version: 0.8.1 TS Home: /home/venv/lib/python3.9/site-packages Current directory: /home/model-server Temp directory: /home/model-server/tmp Metrics config path: /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml Number of GPUs: 0 Number of CPUs: 4 Max heap size: 982 M Python executable: /home/venv/bin/python Config file: /home/model-server/config.properties Inference address: http://0.0.0.0:8080 Management address: http://0.0.0.0:8081 Metrics address: http://0.0.0.0:8082 Model Store: /home/model-server/model-store Initial Models: N/A Log dir: /home/model-server/logs Metrics dir: /home/model-server/logs Netty threads: 32 Netty client threads: 0 Default workers per model: 4 Blacklist Regex: N/A Maximum Response Size: 6553500 Maximum Request Size: 6553500 Limit Maximum Image Pixels: true Prefer direct buffer: false Allowed Urls: [file://.|http(s)?://.] Custom python dependency for model allowed: false Enable metrics API: true Metrics mode: log Disable system metrics: false Workflow Store: /home/model-server/model-store Model config: N/A 2023-07-28T12:40:15,097 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin... 2023-07-28T12:40:15,181 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel. 2023-07-28T12:40:15,471 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080 2023-07-28T12:40:15,472 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel. 2023-07-28T12:40:15,477 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://0.0.0.0:8081 2023-07-28T12:40:15,480 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel. 2023-07-28T12:40:15,482 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082 Model server started. This command is not for general use and should only be run as the result of a call to ProcessBuilder.start() or Runtime.exec() in a java application 2023-07-28T12:40:16,282 [ERROR] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - java.io.IOException: Cannot run program "/home/venv/bin/python" (in directory "/home/venv/lib/python3.9/site-packages"): error=0, Failed to exec spawn helper: pid: 47, exit value: 1 at java.lang.ProcessBuilder.start(ProcessBuilder.java:1143) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?] at java.lang.Runtime.exec(Runtime.java:594) ~[?:?] at org.pytorch.serve.metrics.MetricCollector.run(MetricCollector.java:44) [model-server.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:833) [?:?] Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 47, exit value: 1 at java.lang.ProcessImpl.forkAndExec(Native Method) ~[?:?] at java.lang.ProcessImpl.(ProcessImpl.java:314) ~[?:?] at java.lang.ProcessImpl.start(ProcessImpl.java:244) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1110) ~[?:?] ... 9 more 2023-07-28T12:40:33,226 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model mnist 2023-07-28T12:40:33,229 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model mnist 2023-07-28T12:40:33,230 [INFO ] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - Model mnist loaded. 2023-07-28T12:40:33,239 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - updateModel: mnist, count: 4 2023-07-28T12:40:33,262 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml] 2023-07-28T12:40:33,272 [DEBUG] W-9001-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9001, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml] This command is not for general use and should only be run as the result of a call to ProcessBuilder.start() or Runtime.exec() in a java application 2023-07-28T12:40:33,285 [DEBUG] W-9003-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9003, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml] This command is not for general use and should only be run as the result of a call to ProcessBuilder.start() or Runtime.exec() in a java application 2023-07-28T12:40:33,300 [DEBUG] W-9002-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9002, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml] 2023-07-28T12:40:33,296 [ERROR] W-9001-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:179) ~[model-server.jar:?] at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:339) ~[model-server.jar:?] at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:183) [model-server.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:833) [?:?] Caused by: java.io.IOException: Cannot run program "/home/venv/bin/python" (in directory "/home/model-server/tmp/models/df29a96231534b05945fa892495ecf6c"): error=0, Failed to exec spawn helper: pid: 59, exit value: 1 at java.lang.ProcessBuilder.start(ProcessBuilder.java:1143) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?] at java.lang.Runtime.exec(Runtime.java:594) ~[?:?] at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?] ... 7 more Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 59, exit value: 1 at java.lang.ProcessImpl.forkAndExec(Native Method) ~[?:?] at java.lang.ProcessImpl.(ProcessImpl.java:314) ~[?:?] at java.lang.ProcessImpl.start(ProcessImpl.java:244) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1110) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?] at java.lang.Runtime.exec(Runtime.java:594) ~[?:?] at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?] ... 7 more This command is not for general use and should only be run as the result of a call to ProcessBuilder.start() or Runtime.exec() in a java application 2023-07-28T12:40:33,314 [ERROR] W-9003-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:179) ~[model-server.jar:?] at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:339) ~[model-server.jar:?] at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:183) [model-server.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:833) [?:?] Caused by: java.io.IOException: Cannot run program "/home/venv/bin/python" (in directory "/home/model-server/tmp/models/df29a96231534b05945fa892495ecf6c"): error=0, Failed to exec spawn helper: pid: 62, exit value: 1 at java.lang.ProcessBuilder.start(ProcessBuilder.java:1143) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?] at java.lang.Runtime.exec(Runtime.java:594) ~[?:?] at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?] ... 7 more Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 62, exit value: 1 at java.lang.ProcessImpl.forkAndExec(Native Method) ~[?:?] at java.lang.ProcessImpl.(ProcessImpl.java:314) ~[?:?] at java.lang.ProcessImpl.start(ProcessImpl.java:244) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1110) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?] at java.lang.Runtime.exec(Runtime.java:594) ~[?:?] at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?] ... 7 more 2023-07-28T12:40:33,318 [DEBUG] W-9003-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9003-mnist_1.0 State change null -> WORKER_STOPPED 2023-07-28T12:40:33,319 [INFO ] W-9003-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery start timestamp: 1690548033319 2023-07-28T12:40:33,322 [DEBUG] W-9001-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-mnist_1.0 State change null -> WORKER_STOPPED 2023-07-28T12:40:33,286 [ERROR] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:179) ~[model-server.jar:?] at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:339) ~[model-server.jar:?] at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:183) [model-server.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:833) [?:?] Caused by: java.io.IOException: Cannot run program "/home/venv/bin/python" (in directory "/home/model-server/tmp/models/df29a96231534b05945fa892495ecf6c"): error=0, Failed to exec spawn helper: pid: 54, exit value: 1 at java.lang.ProcessBuilder.start(ProcessBuilder.java:1143) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?] at java.lang.Runtime.exec(Runtime.java:594) ~[?:?] at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?] ... 7 more Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 54, exit value: 1 at java.lang.ProcessImpl.forkAndExec(Native Method) ~[?:?] at java.lang.ProcessImpl.(ProcessImpl.java:314) ~[?:?] at java.lang.ProcessImpl.start(ProcessImpl.java:244) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1110) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?] at java.lang.Runtime.exec(Runtime.java:594) ~[?:?] at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?] ... 7 more 2023-07-28T12:40:33,324 [INFO ] W-9003-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9003 in 1 seconds. 2023-07-28T12:40:33,325 [INFO ] W-9001-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery start timestamp: 1690548033325 2023-07-28T12:40:33,326 [INFO ] W-9001-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9001 in 1 seconds. 2023-07-28T12:40:33,328 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change null -> WORKER_STOPPED 2023-07-28T12:40:33,334 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery start timestamp: 1690548033334 2023-07-28T12:40:33,335 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds. This command is not for general use and should only be run as the result of a call to ProcessBuilder.start() or Runtime.exec() in a java application 2023-07-28T12:40:33,338 [ERROR] W-9002-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:179) ~[model-server.jar:?] at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:339) ~[model-server.jar:?] at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:183) [model-server.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:833) [?:?] Caused by: java.io.IOException: Cannot run program "/home/venv/bin/python" (in directory "/home/model-server/tmp/models/df29a96231534b05945fa892495ecf6c"): error=0, Failed to exec spawn helper: pid: 65, exit value: 1 at java.lang.ProcessBuilder.start(ProcessBuilder.java:1143) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?] at java.lang.Runtime.exec(Runtime.java:594) ~[?:?] at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?] ... 7 more Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 65, exit value: 1 at java.lang.ProcessImpl.forkAndExec(Native Method) ~[?:?] at java.lang.ProcessImpl.(ProcessImpl.java:314) ~[?:?] at java.lang.ProcessImpl.start(ProcessImpl.java:244) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1110) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?] at java.lang.Runtime.exec(Runtime.java:594) ~[?:?] at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?] ... 7 more 2023-07-28T12:40:33,344 [DEBUG] W-9002-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9002-mnist_1.0 State change null -> WORKER_STOPPED 2023-07-28T12:40:33,345 [INFO ] W-9002-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery start timestamp: 1690548033344 2023-07-28T12:40:33,345 [INFO ] W-9002-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9002 in 1 seconds. 2023-07-28T12:40:33,345 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelVersionedRefs - Removed model: mnist version: 1.0 2023-07-28T12:40:33,347 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.WorkerThread - W-9003-mnist_1.0 State change WORKER_STOPPED -> WORKER_SCALED_DOWN 2023-07-28T12:40:33,348 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.WorkerThread - W-9002-mnist_1.0 State change WORKER_STOPPED -> WORKER_SCALED_DOWN 2023-07-28T12:40:33,349 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.WorkerThread - W-9001-mnist_1.0 State change WORKER_STOPPED -> WORKER_SCALED_DOWN 2023-07-28T12:40:33,349 [DEBUG] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change WORKER_STOPPED -> WORKER_SCALED_DOWN 2023-07-28T12:40:33,356 [INFO ] epollEventLoopGroup-3-1 org.pytorch.serve.wlm.ModelManager - Model mnist unregistered. 2023-07-28T12:40:33,371 [INFO ] epollEventLoopGroup-3-1 ACCESS_LOG - /172.17.0.1:38924 "POST /models?model_name=mnist&url=mnist.mar&initial_workers=4 HTTP/1.1" 500 763 2023-07-28T12:40:33,374 [INFO ] epollEventLoopGroup-3-1 TS_METRICS - Requests5XX.Count:1.0|#Level:Host|#hostname:b68f79c2508d,timestamp:1690548033 2023-07-28T12:40:34,367 [DEBUG] W-9001-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9001, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml] 2023-07-28T12:40:34,366 [DEBUG] W-9002-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9002, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml] 2023-07-28T12:40:34,366 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml] 2023-07-28T12:40:34,366 [DEBUG] W-9003-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9003, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml] This command is not for general use and should only be run as the result of a call to ProcessBuilder.start() or Runtime.exec() in a java application This command is not for general use and should only be run as the result of a call to ProcessBuilder.start() or Runtime.exec() in a java application This command is not for general use and should only be run as the result of a call to ProcessBuilder.start() or Runtime.exec() in a java application This command is not for general use and should only be run as the result of a call to ProcessBuilder.start() or Runtime.exec() in a java application 2023-07-28T12:40:34,422 [ERROR] W-9001-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:179) ~[model-server.jar:?] at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:339) ~[model-server.jar:?] at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:183) [model-server.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:833) [?:?] Caused by: java.io.IOException: Cannot run program "/home/venv/bin/python" (in directory "/home/model-server/tmp/models/df29a96231534b05945fa892495ecf6c"): error=0, Failed to exec spawn helper: pid: 75, exit value: 1 at java.lang.ProcessBuilder.start(ProcessBuilder.java:1143) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?] at java.lang.Runtime.exec(Runtime.java:594) ~[?:?] at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?] ... 5 more Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 75, exit value: 1 at java.lang.ProcessImpl.forkAndExec(Native Method) ~[?:?] at java.lang.ProcessImpl.(ProcessImpl.java:314) ~[?:?] at java.lang.ProcessImpl.start(ProcessImpl.java:244) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1110) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?] at java.lang.Runtime.exec(Runtime.java:594) ~[?:?] at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?] ... 5 more 2023-07-28T12:40:34,422 [ERROR] W-9002-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:179) ~[model-server.jar:?] at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:339) ~[model-server.jar:?] at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:183) [model-server.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:833) [?:?] Caused by: java.io.IOException: Cannot run program "/home/venv/bin/python" (in directory "/home/model-server/tmp/models/df29a96231534b05945fa892495ecf6c"): error=0, Failed to exec spawn helper: pid: 72, exit value: 1 at java.lang.ProcessBuilder.start(ProcessBuilder.java:1143) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?] at java.lang.Runtime.exec(Runtime.java:594) ~[?:?] at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?] ... 5 more Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 72, exit value: 1 at java.lang.ProcessImpl.forkAndExec(Native Method) ~[?:?] at java.lang.ProcessImpl.(ProcessImpl.java:314) ~[?:?] at java.lang.ProcessImpl.start(ProcessImpl.java:244) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1110) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?] at java.lang.Runtime.exec(Runtime.java:594) ~[?:?] at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?] ... 5 more 2023-07-28T12:40:34,422 [ERROR] W-9003-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:179) ~[model-server.jar:?] at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:339) ~[model-server.jar:?] at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:183) [model-server.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:833) [?:?] Caused by: java.io.IOException: Cannot run program "/home/venv/bin/python" (in directory "/home/model-server/tmp/models/df29a96231534b05945fa892495ecf6c"): error=0, Failed to exec spawn helper: pid: 70, exit value: 1 at java.lang.ProcessBuilder.start(ProcessBuilder.java:1143) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?] at java.lang.Runtime.exec(Runtime.java:594) ~[?:?] at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?] ... 5 more Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 70, exit value: 1 at java.lang.ProcessImpl.forkAndExec(Native Method) ~[?:?] at java.lang.ProcessImpl.(ProcessImpl.java:314) ~[?:?] at java.lang.ProcessImpl.start(ProcessImpl.java:244) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1110) ~[?:?] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) ~[?:?] at java.lang.Runtime.exec(Runtime.java:594) ~[?:?] at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:161) ~[model-server.jar:?] ... 5 more

Installation instructions

I have torchserve 0.8.1 installed but this error occurs when I run with the latest docker image pytorch/torchserve:0.8.1-cpu

Model Packaing

torch-model-archiver --model-name mnist --version 1.0 --model-file examples/image_classifier/mnist/mnist.py --serialized-file examples/image_classifier/mnist/mnist_cnn.pt --handler  examples/image_classifier/mnist/mnist_handler.py

config.properties

I didn't pass customized config.properties

Versions


Environment headers

Torchserve branch:

torchserve==0.8.1 torch-model-archiver==0.6.0

Python version: 3.8 (64-bit runtime) Python executable: /Users/lzchen/miniforge3/envs/tjc-main/bin/python

Versions of relevant python libraries: captum==0.6.0 numpy==1.20.3 psutil==5.9.4 requests==2.28.1 sentence-transformers==2.2.2 sentencepiece==0.1.95 torch==1.10.2 torch-model-archiver==0.6.0 torchserve==0.8.1 torchvision==0.9.0a0 transformers==4.24.0 wheel==0.37.1 torch==1.10.2 Warning: torchtext not present .. torchvision==0.9.0a0 Warning: torchaudio not present ..

Java Version:

OS: Mac OSX 13.4.1 (arm64) GCC version: N/A Clang version: 14.0.3 (clang-1403.0.22.14.1) CMake version: N/A

Versions of npm installed packages: **Warning: newman, newman-reporter-html markdown-link-check not installed...

Repro instructions

torch-model-archiver --model-name mnist --version 1.0 --model-file examples/image_classifier/mnist/mnist.py --serialized-file examples/image_classifier/mnist/mnist_cnn.pt --handler  examples/image_classifier/mnist/mnist_handler.py
mkdir model_store
mv mnist.mar model_store/
docker run --rm -it -p 8080:8080 -p 8081:8081 -p 8082:8082 -v $(pwd)/model_store:/home/model-server/model-store pytorch/torchserve:latest-cpu

Possible Solution

No response

agunapal commented 1 year ago

Hi @lz-chen I tried the example again. It works fine for me.

I noticed this message in your logs. Can you please confirm your machine/OS. Is it M1 mac?

WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
This command is not for general use and should only be run as the result of a call to
ProcessBuilder.start() or Runtime.exec() in a java application
lz-chen commented 1 year ago

Hi @agunapal Yes it's M1 mac. Actually I just tried on an Ubuntu vm, it works fine there. 👐

msaroufim commented 1 year ago

You can support M1 if you follow instructions here https://github.com/pytorch/serve/issues/1363#issuecomment-1112769676