huggingface pippy example fails to run.

consciousgaze commented 1 year ago

🐛 Describe the bug

I used the docker image of latest torchserve (https://hub.docker.com/layers/pytorch/torchserve/0.8.2-gpu/images/sha256-563e3d46b33091cdf1751e56387dfcc07fe8a8360343235d13489eb60c41f1f5?context=explore) to run the example large model of huggingface pippy opt model. I followed exactly the same process described in the example. But I got 'LOCAL_RANK' not found error

Error logs

model-server@9b26f632c3a1:~$ torchserve --ncs --start --model-store model_store --models opt.tar.gz --foreground
Removing orphan pid file.
--model-store directory not found: model_store
model-server@9b26f632c3a1:~$ torchserve --ncs --start --model-store model-store --models opt.tar.gz --foreground
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2023-09-12T09:55:11,260 [WARN ] main org.pytorch.serve.util.ConfigManager - Your torchserve instance can access any URL to load models. When deploying to production, make sure to limit the set of allowed_urls in config.properties
2023-09-12T09:55:11,263 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2023-09-12T09:55:11,317 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2023-09-12T09:55:11,415 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.8.2
TS Home: /home/venv/lib/python3.9/site-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Metrics config path: /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
Number of GPUs: 1
Number of CPUs: 12
Max heap size: 16048 M
Python executable: /home/venv/bin/python
Config file: config.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8081
Metrics address: http://0.0.0.0:8082
Model Store: /home/model-server/model-store
Initial Models: opt.tar.gz
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 32
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Enable metrics API: true
Metrics mode: log
Disable system metrics: false
Workflow Store: /home/model-server/model-store
Model config: N/A
2023-09-12T09:55:11,421 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2023-09-12T09:55:11,439 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: opt.tar.gz
2023-09-12T09:55:11,466 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model opt
2023-09-12T09:55:11,466 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model opt
2023-09-12T09:55:11,466 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model opt loaded.
2023-09-12T09:55:11,466 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: opt, count: 1
2023-09-12T09:55:11,474 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-09-12T09:55:11,473 [DEBUG] W-9000-opt_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-09-12T09:55:11,524 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080
2023-09-12T09:55:11,524 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2023-09-12T09:55:11,525 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://0.0.0.0:8081
2023-09-12T09:55:11,526 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2023-09-12T09:55:11,526 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082
Model server started.
2023-09-12T09:55:11,699 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2023-09-12T09:55:12,176 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:9b26f632c3a1,timestamp:1694512512
2023-09-12T09:55:12,177 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:818.1998100280762|#Level:Host|#hostname:9b26f632c3a1,timestamp:1694512512
2023-09-12T09:55:12,178 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:50.556339263916016|#Level:Host|#hostname:9b26f632c3a1,timestamp:1694512512
2023-09-12T09:55:12,178 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:5.8|#Level:Host|#hostname:9b26f632c3a1,timestamp:1694512512
2023-09-12T09:55:12,179 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.05292297671389025|#Level:Host,DeviceId:0|#hostname:9b26f632c3a1,timestamp:1694512512
2023-09-12T09:55:12,179 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:13.0|#Level:Host,DeviceId:0|#hostname:9b26f632c3a1,timestamp:1694512512
2023-09-12T09:55:12,180 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:9b26f632c3a1,timestamp:1694512512
2023-09-12T09:55:12,180 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:61622.94921875|#Level:Host|#hostname:9b26f632c3a1,timestamp:1694512512
2023-09-12T09:55:12,180 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:1848.62890625|#Level:Host|#hostname:9b26f632c3a1,timestamp:1694512512
2023-09-12T09:55:12,181 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:4.0|#Level:Host|#hostname:9b26f632c3a1,timestamp:1694512512
2023-09-12T09:55:12,982 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9000, pid=1550
2023-09-12T09:55:12,983 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2023-09-12T09:55:12,989 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-09-12T09:55:12,990 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - [PID]1550
2023-09-12T09:55:12,990 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - Torch worker started.
2023-09-12T09:55:12,991 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2023-09-12T09:55:12,991 [DEBUG] W-9000-opt_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-opt_1.0 State change null -> WORKER_STARTED
2023-09-12T09:55:12,997 [INFO ] W-9000-opt_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2023-09-12T09:55:13,007 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2023-09-12T09:55:13,010 [INFO ] W-9000-opt_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1694512513010
2023-09-12T09:55:13,052 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - model_name: opt, batchSize: 1
2023-09-12T09:55:13,491 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - Enabled tensor cores
2023-09-12T09:55:13,492 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-09-12T09:55:13,492 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2023-09-12T09:55:13,493 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - Transformers version 4.33.1
2023-09-12T09:55:13,493 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - PyTorch version is 2.0.0 or greater
2023-09-12T09:55:13,494 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - Backend worker process died.
2023-09-12T09:55:13,495 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2023-09-12T09:55:13,495 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 253, in <module>
2023-09-12T09:55:13,496 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG -     worker.run_server()
2023-09-12T09:55:13,496 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 221, in run_server
2023-09-12T09:55:13,496 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED
2023-09-12T09:55:13,497 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG -     self.handle_connection(cl_socket)
2023-09-12T09:55:13,497 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 184, in handle_connection
2023-09-12T09:55:13,497 [DEBUG] W-9000-opt_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2023-09-12T09:55:13,498 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG -     service, result, code = self.load_model(msg)
2023-09-12T09:55:13,498 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 131, in load_model
2023-09-12T09:55:13,499 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG -     service = model_loader.load(
2023-09-12T09:55:13,499 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 135, in load
2023-09-12T09:55:13,499 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG -     initialize_fn(service.context)
2023-09-12T09:55:13,500 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG -   File "/home/model-server/tmp/models/e9357307535945f5a054e947d1f88bda/pippy_handler.py", line 40, in initialize
2023-09-12T09:55:13,500 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG -     super().initialize(ctx)
2023-09-12T09:55:13,500 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/torch_handler/distributed/base_pippy_handler.py", line 19, in initialize
2023-09-12T09:55:13,501 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG -     self.local_rank = int(os.environ["LOCAL_RANK"])
2023-09-12T09:55:13,501 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.9/os.py", line 679, in __getitem__
2023-09-12T09:55:13,502 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG -     raise KeyError(key) from None
2023-09-12T09:55:13,502 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - KeyError: 'LOCAL_RANK'
2023-09-12T09:55:13,498 [DEBUG] W-9000-opt_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
java.lang.InterruptedException: null
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679) ~[?:?]
    at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435) ~[?:?]
    at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:213) [model-server.jar:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
    at java.lang.Thread.run(Thread.java:833) [?:?]
2023-09-12T09:55:13,526 [WARN ] W-9000-opt_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: opt, error: Worker died.
2023-09-12T09:55:13,527 [DEBUG] W-9000-opt_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-opt_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2023-09-12T09:55:13,527 [INFO ] W-9000-opt_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery start timestamp: 1694512513527
2023-09-12T09:55:13,528 [WARN ] W-9000-opt_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-opt_1.0-stderr
2023-09-12T09:55:13,528 [WARN ] W-9000-opt_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-opt_1.0-stdout
2023-09-12T09:55:13,528 [INFO ] W-9000-opt_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds.
2023-09-12T09:55:13,549 [INFO ] W-9000-opt_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-opt_1.0-stdout
2023-09-12T09:55:13,549 [INFO ] W-9000-opt_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-opt_1.0-stderr
2023-09-12T09:55:14,529 [DEBUG] W-9000-opt_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-09-12T09:55:15,747 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9000, pid=1598
2023-09-12T09:55:15,752 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2023-09-12T09:55:15,756 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-09-12T09:55:15,756 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - [PID]1598
2023-09-12T09:55:15,756 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - Torch worker started.
2023-09-12T09:55:15,756 [DEBUG] W-9000-opt_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-opt_1.0 State change WORKER_STOPPED -> WORKER_STARTED
2023-09-12T09:55:15,757 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2023-09-12T09:55:15,757 [INFO ] W-9000-opt_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2023-09-12T09:55:15,759 [INFO ] W-9000-opt_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2023-09-12T09:55:15,759 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED
2023-09-12T09:55:15,759 [DEBUG] W-9000-opt_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2023-09-12T09:55:15,760 [DEBUG] W-9000-opt_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
java.lang.InterruptedException: null
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679) ~[?:?]
    at java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:515) ~[?:?]
    at java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:677) ~[?:?]
    at org.pytorch.serve.wlm.Model.pollBatch(Model.java:276) ~[model-server.jar:?]
    at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:34) ~[model-server.jar:?]
    at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:186) [model-server.jar:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
    at java.lang.Thread.run(Thread.java:833) [?:?]

Installation instructions

I am using docker. My docker file looke like:

FROM pytorch/torchserve:0.8.2-gpu

USER root
RUN apt update && apt install -y zip vim
RUN pip3 install torchpippy transformers fairscale sentencepiece
USER model-server

VOLUME /app
VOLUME /model

# WORKDIR /app

CMD /bin/bash

Model Packaing

I used the command in the example to packge the model: torch-model-archiver --model-name opt --version 1.0 --handler pippy_handler.py -r requirements.txt --config-file model-config.yaml --archive-format tgz

The only thing that amy worth mentioning is that I set nproc-per-node: 1 in model-config.yaml since i only want to use one gpu.

config.properties

I didn't specify config.properties. it's the default one in the docker. it looks like

metrics_address=http://0.0.0.0:8082
number_of_netty_threads=32
job_queue_size=1000
model_store=/home/model-server/model-store
workflow_store=/home/model-server/wf-store

Versions

model-server@0d0720355665:/app$ python serve/ts_scripts/print_env_info.py
------------------------------------------------------------------------------------------
Environment headers
------------------------------------------------------------------------------------------
Torchserve branch:

torchserve==0.8.2
torch-model-archiver==0.8.2

Python version: 3.9 (64-bit runtime)
Python executable: /home/venv/bin/python

Versions of relevant python libraries:
captum==0.6.0
numpy==1.24.3
nvgpu==0.10.0
psutil==5.9.5
requests==2.31.0
sentencepiece==0.1.99
torch==2.0.1+cu118
torch-model-archiver==0.8.2
torch-workflow-archiver==0.2.10
torchaudio==2.0.2+cu118
torchdata==0.6.1
torchpippy==0.1.1
torchserve==0.8.2
torchtext==0.15.2+cpu
torchvision==0.15.2+cu118
transformers==4.33.1
wheel==0.40.0
torch==2.0.1+cu118
torchtext==0.15.2+cpu
torchvision==0.15.2+cu118
torchaudio==2.0.2+cu118

Java Version:

OS: Ubuntu 20.04.6 LTS
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: N/A
CMake version: version 3.27.2

Is CUDA available: Yes
CUDA runtime version: N/A
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 2060
GPU 1: NVIDIA GeForce RTX 4090
Nvidia driver version: 535.86.05
cuDNN version: None

Repro instructions

git clone https://github.com/pytorch/serve.git
cd serve/examples/large_models/Huggingface_pippy/
python ../utils/Download_model.py --model_name facebook/opt-30b
torch-model-archiver --model-name opt --version 1.0 --handler pippy_handler.py  -r requirements.txt --config-file model-config.yaml --archive-format tgz
cp opt.tar.gz ~/model-store
cd ~
torchserve --ncs --start --model-store model-store --models opt.tar.gz

Possible Solution

No response

agunapal commented 1 year ago

@HamidShojanazeri Have we tried pippy in a docker container? I am wondering if it is initializing local rank at all?

consciousgaze commented 1 year ago

Is there any update?

lxning commented 1 year ago

@consciousgaze according to the log, torchrun is not started.

could you please provide the model-config.yaml.
the model_name needs to be adjusted to docker env. eg. you can map your local path to docker

consciousgaze commented 1 year ago

Hi, I have updated the model-config.yaml and tried again. The model-config.yaml I use is

#frontend settings
minWorkers: 1
maxWorkers: 1
maxBatchDelay: 200
responseTimeout: 300
parallelType: "pp"
deviceType: "gpu"
torchrun:
    nproc-per-node: 1

#backend settings
pippy:
    rpc_timeout: 1800
    model_type: "HF"
    chunks: 1
    input_names: ["input_ids"]
    num_worker_threads: 128

handler:
    model_path: "/app/serve/examples/large_models/Huggingface_pippy/model/models--facebook--opt-30b/snapshots/ceea0a90ac0f6fae7c2c34bcb40477438c152546"
    index_filename: 'pytorch_model.bin.index.json'
    max_length: 50
    max_new_tokens: 60
    manual_seed: 40
    dtype: fp16

It still fails for KeyError: 'LOCAL_RANK' But I found a way to get it go through. If I change nproc-per-node: 1 to nproc-per-node: 2, the model prepare will finish. Is the nproc-per-node also controlling whether to use torchrun?

shotarok commented 9 months ago

But I found a way to get it go through. If I change nproc-per-node: 1 to nproc-per-node: 2, the model prepare will finish. Is the nproc-per-node also controlling whether to use torchrun?

@consciousgaze I'm also looking into a similar issue that LOCAL_RANK is unavailable in a worker, and I happened to find this PR: https://github.com/pytorch/serve/pull/2608, and it looks related to the behavior. v0.9.0 contains the PR's change. Did you try that version?

[UPDATED] I could get LOCAL_RANK with v0.9.0. In my case, the config.yaml was missing parallelType. As a result, parallelLevel was not set (code).

pytorch / serve