pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.21k stars 859 forks source link

Model with name <name> is not ready. #2993

Closed tylertitsworth closed 8 months ago

tylertitsworth commented 8 months ago

πŸ› Describe the bug

When creating a new KServe (0.11.2) Endpoint utilizing the TorchServe predictor in this example. It registers the model(s) without issue, however when I attempt an inference request I receive the following: {"error":"Model with name bert is not ready."}. I have tried this process with both mnist and bert examples, and I first had to modify the provided json input to be valid:

{
    "id": "d3b15cad-50a2-4eaf-80ce-8b0a428bd298",
    "inputs": [
        {
            "name": "4b7c7d4a-51e4-43c8-af61-04639f6ef4bc",
            "shape": [-1],
            "datatype": "BYTES",
            "data": ["{\"text\":\"Bloomberg has decided to publish a new report on the global economy.\", \"target\":1}"]
        }
    ]
}

Error logs

2024-03-01 17:32:23.261 10 kserve ERROR [model_not_ready_handler():105] Exception: 
 Traceback (most recent call last): 
   File "/home/venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__ 
     await self.app(scope, receive, sender) 
   File "/home/venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__ 
     raise e 
   File "/home/venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__ 
     await self.app(scope, receive, send) 
   File "/home/venv/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__ 
     await route.handle(scope, receive, send) 
   File "/home/venv/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle 
     await self.app(scope, receive, send) 
   File "/home/venv/lib/python3.10/site-packages/starlette/routing.py", line 66, in app 
     response = await func(request) 
   File "/home/venv/lib/python3.10/site-packages/fastapi/routing.py", line 237, in app 
     raw_response = await run_endpoint_function( 
   File "/home/venv/lib/python3.10/site-packages/fastapi/routing.py", line 163, in run_endpoint_function 
     return await dependant.call(**values) 
   File "/home/venv/lib/python3.10/site-packages/kserve/protocol/rest/v2_endpoints.py", line 138, in infer 
     raise ModelNotReady(model_name) 
 kserve.errors.ModelNotReady: Model with name bert is not ready.

Installation instructions

Install torchserve from source: No Using Docker: Yes, k8s

Model Packaing

using the provided example instructions and then using kubectl cp to a PVC in my cluster with the correct permissions

config.properties

inference_address=http://0.0.0.0:8085
management_address=http://0.0.0.0:8085
metrics_address=http://0.0.0.0:8082
grpc_inference_port=7070
grpc_management_port=7071
enable_envvars_config=true
install_py_dep_per_model=true
enable_metrics_api=true
metrics_mode=prometheus
NUM_WORKERS=1
number_of_netty_threads=4
job_queue_size=10
model_store=/mnt/models/model-store
model_snapshot={"name":"startup.cfg","modelCount":2,"models":{"mnist":{"1.0":{"defaultVersion":true,"marName":"mnist.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}},"bert":{"1.0":{"defaultVersion":true,"marName":"bert.mar","minWorkers":1,"maxWorkers":5,"batchSize":2,"maxBatchDelay":500,"responseTimeout":60}}}}
allowed_urls=https://s3.amazonaws.com/.*,https://torchserve.pytorch.org/.*
ipex_enable=true
cpu_launcher_enable=true
cpu_launcher_args=--use_logical_core

Versions

from intel/intel-extension-for-pytorch:2.2.0-serving-cpu-kserve:

------------------------------------------------------------------------------------------
Environment headers
------------------------------------------------------------------------------------------
Torchserve branch: 

torchserve==0.9.0
torch-model-archiver==0.9.0

Python version: 3.10 (64-bit runtime)
Python executable: /home/venv/bin/python

Versions of relevant python libraries:
captum==0.7.0
intel-extension-for-pytorch==2.2.0+cpu
numpy==1.26.4
pillow==10.2.0
psutil==5.9.8
requests==2.31.0
requests-oauthlib==1.3.1
torch==2.2.0+cpu
torch-model-archiver==0.9.0
torch-workflow-archiver==0.2.11
torchaudio==2.2.0+cpu
torchdata==0.7.1
torchserve==0.9.0
torchtext==0.17.0+cpu
torchvision==0.17.0+cpu
transformers==4.38.1
wheel==0.42.0
torch==2.2.0+cpu
torchtext==0.17.0+cpu
torchvision==0.17.0+cpu
torchaudio==2.2.0+cpu

Java Version:

OS: N/A
GCC version: N/A
Clang version: N/A
CMake version: N/A

Repro instructions

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: "my-endpoint"
spec:
  predictor:
    model:
      modelFormat:
        name: pytorch
      protocolVersion: v2
      storageUri: pvc://my-pvc
$ kubectl cp config.properties <pvcviewer-pod>:config/config.properties
$ kubectl cp model-store <pvcviewer-pod>:model-store
$ kubectl apply -f endpoint.yaml
$ curl -s -X POST --location "http://$INGRESS_HOST:$INGRESS_PORT/v2/models/bert/infer" \
--header "Host: $SERVICE_HOSTNAME" \
--header 'Content-Type: application/json' \
--header "Cookie: authservice_session=$AUTHSERVICE_TOKEN" -d @./bytes.json

Possible Solution

The issue seems to be in the kserve wrapper, because this issue does not occur with the standard torchserve container, and the kserve container uses that torchserve container as a base.

I am also seeing the same issue in #2968 with the bert model, however the result is the same for the mnist model.

If the worker died and was not reported by the logger, then I'm not clear on how to get the status of the predictor's workers via the predict v2 api.

agunapal commented 8 months ago

Hi @tylertitsworth if you have access to the pod, you can check the ts_log.log to see if the worker died. You could also try increasing the response timeout.

Are you able to check the logs of the pod?

We do run this script nightly https://github.com/pytorch/serve/blob/master/kubernetes/kserve/tests/scripts/test_mnist.sh and you can see it passing.

tylertitsworth commented 8 months ago

ts_log.log:

2024-03-03T23:37:55,038 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2024-03-03T23:37:55,038 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2024-03-03T23:37:55,145 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from /home/venv/lib/python3.10/site-packages/ts/configs/metrics.yaml
2024-03-03T23:37:55,145 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from /home/venv/lib/python3.10/site-packages/ts/configs/metrics.yaml
2024-03-03T23:37:55,344 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.9.0
TS Home: /home/venv/lib/python3.10/site-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Metrics config path: /home/venv/lib/python3.10/site-packages/ts/configs/metrics.yaml
Number of GPUs: 0
Number of CPUs: 1
Max heap size: 494 M
Python executable: /home/venv/bin/python
Config file: /mnt/models/config/config.properties
Inference address: http://0.0.0.0:8085
Management address: http://0.0.0.0:8085
Metrics address: http://0.0.0.0:8082
Model Store: /mnt/pvc/model-store
Initial Models: N/A
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 4
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [https://s3.amazonaws.com/.*, https://torchserve.pytorch.org/.*]
Custom python dependency for model allowed: true
Enable metrics API: true
Metrics mode: prometheus
Disable system metrics: false
Workflow Store: /mnt/pvc/model-store
Model config: N/A
2024-03-03T23:37:55,344 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.9.0
TS Home: /home/venv/lib/python3.10/site-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Metrics config path: /home/venv/lib/python3.10/site-packages/ts/configs/metrics.yaml
Number of GPUs: 0
Number of CPUs: 1
Max heap size: 494 M
Python executable: /home/venv/bin/python
Config file: /mnt/models/config/config.properties
Inference address: http://0.0.0.0:8085
Management address: http://0.0.0.0:8085
Metrics address: http://0.0.0.0:8082
Model Store: /mnt/pvc/model-store
Initial Models: N/A
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 4
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [https://s3.amazonaws.com/.*, https://torchserve.pytorch.org/.*]
Custom python dependency for model allowed: true
Enable metrics API: true
Metrics mode: prometheus
Disable system metrics: false
Workflow Store: /mnt/pvc/model-store
Model config: N/A
2024-03-03T23:37:55,350 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2024-03-03T23:37:55,350 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2024-03-03T23:37:55,427 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Started restoring models from snapshot {"name":"startup.cfg","modelCount":2,"models":{"mnist":{"1.0":{"defaultVersion":true,"marName":"mnist.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}},"bert":{"1.0":{"defaultVersion":true,"marName":"bert.mar","minWorkers":1,"maxWorkers":5,"batchSize":2,"maxBatchDelay":500,"responseTimeout":60}}}}
2024-03-03T23:37:55,427 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Started restoring models from snapshot {"name":"startup.cfg","modelCount":2,"models":{"mnist":{"1.0":{"defaultVersion":true,"marName":"mnist.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}},"bert":{"1.0":{"defaultVersion":true,"marName":"bert.mar","minWorkers":1,"maxWorkers":5,"batchSize":2,"maxBatchDelay":500,"responseTimeout":60}}}}
2024-03-03T23:37:55,434 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Validating snapshot startup.cfg
2024-03-03T23:37:55,434 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Validating snapshot startup.cfg
2024-03-03T23:37:55,439 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Snapshot startup.cfg validated successfully
2024-03-03T23:37:55,439 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Snapshot startup.cfg validated successfully
2024-03-03T23:37:55,859 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model mnist
2024-03-03T23:37:55,859 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model mnist
2024-03-03T23:37:55,859 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model mnist
2024-03-03T23:37:55,859 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model mnist
2024-03-03T23:37:55,859 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model mnist
2024-03-03T23:37:55,859 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model mnist
2024-03-03T23:37:55,859 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model mnist loaded.
2024-03-03T23:37:55,859 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model mnist loaded.
2024-03-03T23:37:55,859 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: mnist, count: 1
2024-03-03T23:37:55,859 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: mnist, count: 1
2024-03-03T23:37:55,932 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - launcherAvailable cmdline: [python, -m, torch.backends.xeon.run_cpu, --use_logical_core, --no_python, hostname]
2024-03-03T23:37:55,932 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - launcherAvailable cmdline: [python, -m, torch.backends.xeon.run_cpu, --use_logical_core, --no_python, hostname]
2024-03-03T23:38:00,831 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, -m, torch.backends.xeon.run_cpu, --use_logical_core, /home/venv/lib/python3.10/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /home/venv/lib/python3.10/site-packages/ts/configs/metrics.yaml]
2024-03-03T23:38:00,831 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, -m, torch.backends.xeon.run_cpu, --use_logical_core, /home/venv/lib/python3.10/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /home/venv/lib/python3.10/site-packages/ts/configs/metrics.yaml]
2024-03-03T23:38:03,092 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG - 2024-03-03 23:38:03,091 - __main__ - WARNING - Neither TCMalloc nor JeMalloc is found in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib
2024-03-03T23:38:03,093 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG -                             or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or
2024-03-03T23:38:03,126 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG -                            /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set.
2024-03-03T23:38:03,126 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG -                            This may drop the performance
2024-03-03T23:38:03,126 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG - 2024-03-03 23:38:03,091 - __main__ - INFO - OMP_NUM_THREADS=96
2024-03-03T23:38:03,127 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG - 2024-03-03 23:38:03,091 - __main__ - WARNING - Unable to find the iomp library file libiomp5.so in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set. you can use "conda install mkl" to install iomp
2024-03-03T23:38:03,127 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG - 2024-03-03 23:38:03,091 - __main__ - WARNING - Numa Aware: cores:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95] on different NUMA nodes:[0, 1, 2, 3]. To avoid this behavior, please use --ncores-per-instance knob to make sure number of cores is divisible by --ncores-per-instance. Alternatively, please use --skip-cross-node-cores knob.
2024-03-03T23:38:03,127 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG - 2024-03-03 23:38:03,091 - __main__ - INFO - numactl -C 0-95 -m 0,1,2,3 /home/venv/bin/python -u /home/venv/lib/python3.10/site-packages/ts/model_service_worker.py --sock-type unix --sock-name /home/model-server/tmp/.ts.sock.9000 --metrics-config /home/venv/lib/python3.10/site-packages/ts/configs/metrics.yaml
2024-03-03T23:38:05,445 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9000, pid=264
2024-03-03T23:38:05,447 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2024-03-03T23:38:05,454 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.10/site-packages/ts/configs/metrics.yaml.
2024-03-03T23:38:05,455 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - [PID]264
2024-03-03T23:38:05,455 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Torch worker started.
2024-03-03T23:38:05,455 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Python runtime: 3.10.12
2024-03-03T23:38:05,456 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change null -> WORKER_STARTED
2024-03-03T23:38:05,456 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change null -> WORKER_STARTED
2024-03-03T23:38:05,477 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2024-03-03T23:38:05,477 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2024-03-03T23:38:05,567 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2024-03-03T23:38:05,569 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1709509085569
2024-03-03T23:38:05,569 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1709509085569
2024-03-03T23:38:05,671 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - model_name: mnist, batchSize: 1
2024-03-03T23:38:07,644 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - generated new fontManager
2024-03-03T23:38:09,231 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2024-03-03T23:38:09,232 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2024-03-03T23:38:10,427 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Compiled model with ipex
2024-03-03T23:38:10,427 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - '/home/model-server/tmp/models/25d16de473c84f3fa33a87456a56f1be/index_to_name.json' is missing. Inference output will not include class name.
2024-03-03T23:38:10,436 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - sent a reply, jobdone: true
2024-03-03T23:38:10,436 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - sent a reply, jobdone: true
2024-03-03T23:38:10,436 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 4765
2024-03-03T23:38:10,436 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 4765
2024-03-03T23:38:10,437 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-03-03T23:38:10,437 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-03-03T23:38:11,491 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model bert
2024-03-03T23:38:11,491 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model bert
2024-03-03T23:38:11,491 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model bert
2024-03-03T23:38:11,491 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model bert
2024-03-03T23:38:11,491 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model bert
2024-03-03T23:38:11,491 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model bert
2024-03-03T23:38:11,491 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model bert loaded.
2024-03-03T23:38:11,491 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model bert loaded.
2024-03-03T23:38:11,491 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: bert, count: 1
2024-03-03T23:38:11,491 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: bert, count: 1
2024-03-03T23:38:11,493 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - launcherAvailable cmdline: [python, -m, torch.backends.xeon.run_cpu, --use_logical_core, --no_python, hostname]
2024-03-03T23:38:11,493 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2024-03-03T23:38:11,493 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2024-03-03T23:38:11,493 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - launcherAvailable cmdline: [python, -m, torch.backends.xeon.run_cpu, --use_logical_core, --no_python, hostname]
2024-03-03T23:38:11,504 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8085
2024-03-03T23:38:11,504 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8085
2024-03-03T23:38:11,504 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2024-03-03T23:38:11,504 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2024-03-03T23:38:11,504 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082
2024-03-03T23:38:11,504 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082
2024-03-03T23:38:11,979 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2024-03-03T23:38:11,979 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2024-03-03T23:38:14,567 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, -m, torch.backends.xeon.run_cpu, --use_logical_core, /home/venv/lib/python3.10/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9001, --metrics-config, /home/venv/lib/python3.10/site-packages/ts/configs/metrics.yaml]
2024-03-03T23:38:14,567 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, -m, torch.backends.xeon.run_cpu, --use_logical_core, /home/venv/lib/python3.10/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9001, --metrics-config, /home/venv/lib/python3.10/site-packages/ts/configs/metrics.yaml]
2024-03-03T23:38:15,970 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-03 23:38:15,969 - __main__ - WARNING - Neither TCMalloc nor JeMalloc is found in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib
2024-03-03T23:38:15,971 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                             or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or
2024-03-03T23:38:15,972 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                            /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set.
2024-03-03T23:38:15,972 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                            This may drop the performance
2024-03-03T23:38:15,972 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-03 23:38:15,969 - __main__ - INFO - OMP_NUM_THREADS=96
2024-03-03T23:38:15,972 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-03 23:38:15,970 - __main__ - WARNING - Unable to find the iomp library file libiomp5.so in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set. you can use "conda install mkl" to install iomp
2024-03-03T23:38:15,972 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-03 23:38:15,970 - __main__ - WARNING - Numa Aware: cores:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95] on different NUMA nodes:[0, 1, 2, 3]. To avoid this behavior, please use --ncores-per-instance knob to make sure number of cores is divisible by --ncores-per-instance. Alternatively, please use --skip-cross-node-cores knob.
2024-03-03T23:38:15,973 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-03 23:38:15,970 - __main__ - INFO - numactl -C 0-95 -m 0,1,2,3 /home/venv/bin/python -u /home/venv/lib/python3.10/site-packages/ts/model_service_worker.py --sock-type unix --sock-name /home/model-server/tmp/.ts.sock.9001 --metrics-config /home/venv/lib/python3.10/site-packages/ts/configs/metrics.yaml
2024-03-03T23:38:17,351 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9001, pid=529
2024-03-03T23:38:17,352 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9001
2024-03-03T23:38:17,361 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.10/site-packages/ts/configs/metrics.yaml.
2024-03-03T23:38:17,362 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - [PID]529
2024-03-03T23:38:17,362 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Torch worker started.
2024-03-03T23:38:17,362 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Python runtime: 3.10.12
2024-03-03T23:38:17,362 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change null -> WORKER_STARTED
2024-03-03T23:38:17,362 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change null -> WORKER_STARTED
2024-03-03T23:38:17,363 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9001
2024-03-03T23:38:17,363 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9001
2024-03-03T23:38:17,364 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1709509097364
2024-03-03T23:38:17,364 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1709509097364
2024-03-03T23:38:17,364 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9001.
2024-03-03T23:38:17,399 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - model_name: bert, batchSize: 2
2024-03-03T23:38:19,272 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2024-03-03T23:38:19,272 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2024-03-03T23:38:19,273 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Transformers version 4.38.1
2024-03-03T23:38:36,043 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 
2024-03-03T23:38:36,044 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]
2024-03-03T23:38:36,044 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - tokenizer_config.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 48.0/48.0 [00:00<00:00, 360kB/s]
2024-03-03T23:38:36,230 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 
2024-03-03T23:38:36,231 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]
2024-03-03T23:38:36,231 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - config.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 570/570 [00:00<00:00, 5.08MB/s]
2024-03-03T23:38:36,437 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 
2024-03-03T23:38:36,444 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]
2024-03-03T23:38:36,445 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - vocab.txt: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 232k/232k [00:00<00:00, 33.1MB/s]
2024-03-03T23:38:36,691 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 
2024-03-03T23:38:36,701 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]
2024-03-03T23:38:36,702 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - tokenizer.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 466k/466k [00:00<00:00, 49.1MB/s]
2024-03-03T23:38:36,949 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Transformer model from path /home/model-server/tmp/models/22ceddb935e04bb7bcb6f76ce49bb533 loaded successfully
2024-03-03T23:38:36,951 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - sent a reply, jobdone: true
2024-03-03T23:38:36,951 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - sent a reply, jobdone: true
2024-03-03T23:38:36,951 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 19551
2024-03-03T23:38:36,951 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 19551
2024-03-03T23:38:36,951 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-03-03T23:38:36,951 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED

In the main log output with bert model:

 2024-03-04T19:03:39,575 [DEBUG] W-9002-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9002-bert_1.0 State change WORKER_STARTED -> WORKER_STOPPED 
 2024-03-04T19:03:39,575 [WARN ] W-9002-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again 
 2024-03-04T19:03:39,575 [WARN ] W-9002-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9002-bert_1.0-stderr 
 2024-03-04T19:03:39,575 [WARN ] W-9002-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9002-bert_1.0-stdout 
 2024-03-04T19:03:39,575 [INFO ] W-9002-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9002 in 1 seconds. 

Same process with mnist:

 2024-03-04T19:04:51,345 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change WORKER_STOPPED -> WORKER_STARTED 
 2024-03-04T19:04:51,346 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000 
 2024-03-04T19:04:51,346 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED 
 2024-03-04T19:04:51,346 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED 
 2024-03-04T19:04:51,346 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000. 
 2024-03-04T19:04:51,346 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
 java.lang.InterruptedException: null 
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679) ~[?:?] 
    at java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:515) ~[?:?] 
    at java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:677) ~[?:?] 
    at org.pytorch.serve.wlm.Model.pollBatch(Model.java:367) ~[model-server.jar:?] 
    at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:36) ~[model-server.jar:?] 
    at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:194) [model-server.jar:?] 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] 
    at java.lang.Thread.run(Thread.java:840) [?:?] 
 2024-03-04T19:04:51,347 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change WORKER_STARTED -> WORKER_STOPPED 

We actually get an error this time.

agunapal commented 8 months ago

I wonder if you are hitting this https://github.com/pytorch/serve/issues/2870

Is it possible for you to try with the nightly kserve image? We will be making a new release shortly.

tylertitsworth commented 8 months ago

Modified the clusterservingruntime

# Source: kserve/templates/clusterservingruntimes.yaml
apiVersion: serving.kserve.io/v1alpha1
kind: ClusterServingRuntime
metadata:
  name: kserve-torchserve
spec:
  annotations:
    prometheus.kserve.io/port: '8082'
    prometheus.kserve.io/path: "/metrics"
  supportedModelFormats:
    - name: pytorch
      version: "1"
      autoSelect: true
  protocolVersions:
    - v1
    - v2
    - grpc-v1
  containers:
    - name: kserve-container
      image: "pytorch/torchserve-nightly:latest-cpu"
      args:
        - torchserve
        - --start
        - --model-store=/mnt/models/model-store
        - --ts-config=/mnt/models/config/config.properties
      env:
        - name: "TS_SERVICE_ENVELOPE"
          value: "{{.Labels.serviceEnvelope}}"
...

I'm getting the following output when I deploy:

2024-03-04T23:03:45,937 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap_external>", line 850, in exec_module
2024-03-04T23:03:45,937 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
2024-03-04T23:03:45,937 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/model-server/tmp/models/1c029c68cc9d45bd9ed5cce3528928b4/Transformer_kserve_handler.py", line 3, in <module>
2024-03-04T23:03:45,937 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     from Transformer_handler_generalized import (
2024-03-04T23:03:45,937 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/model-server/tmp/models/1c029c68cc9d45bd9ed5cce3528928b4/Transformer_handler_generalized.py", line 8, in <module>
2024-03-04T23:03:45,937 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2024-03-04T23:03:45,937 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     import transformers
2024-03-04T23:03:45,937 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'transformers'
2024-03-04T23:03:45,937 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - 
2024-03-04T23:03:45,937 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - During handling of the above exception, another exception occurred:
2024-03-04T23:03:45,937 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - 
2024-03-04T23:03:45,937 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-03-04T23:03:45,937 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 263, in <module>
2024-03-04T23:03:45,937 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     worker.run_server()
2024-03-04T23:03:45,937 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 231, in run_server
2024-03-04T23:03:45,937 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     self.handle_connection(cl_socket)
2024-03-04T23:03:45,937 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 194, in handle_connection
2024-03-04T23:03:45,937 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died., responseTimeout:60sec
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679) ~[?:?]
        at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435) ~[?:?]
        at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:229) [model-server.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:840) [?:?]
2024-03-04T23:03:45,937 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     service, result, code = self.load_model(msg)
2024-03-04T23:03:45,938 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 131, in load_model
2024-03-04T23:03:45,938 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     service = model_loader.load(
2024-03-04T23:03:45,938 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: bert, error: Worker died.
2024-03-04T23:03:45,938 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 110, in load
2024-03-04T23:03:45,938 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module = self._load_default_handler(handler)
2024-03-04T23:03:45,938 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2024-03-04T23:03:45,938 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 159, in _load_default_handler
2024-03-04T23:03:45,938 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-03-04T23:03:45,938 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name, "ts.torch_handler")
2024-03-04T23:03:45,938 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
2024-03-04T23:03:45,938 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     return _bootstrap._gcd_import(name[level:], package, level)
2024-03-04T23:03:45,938 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
2024-03-04T23:03:45,938 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
2024-03-04T23:03:45,938 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
2024-03-04T23:03:45,938 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9001-bert_1.0-stderr
2024-03-04T23:03:45,938 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9001-bert_1.0-stdout
2024-03-04T23:03:45,938 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
2024-03-04T23:03:45,938 [INFO ] W-9001-bert_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9001-bert_1.0-stdout
2024-03-04T23:03:45,938 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9001 in 8 seconds.
2024-03-04T23:03:46,297 [INFO ] W-9001-bert_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9001-bert_1.0-stderr
2024-03-04T23:03:53,939 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - launcherAvailable cmdline: [python, -m, torch.backends.xeon.run_cpu, --use_logical_core, --no_python, hostname]
2024-03-04T23:03:56,989 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, -m, torch.backends.xeon.run_cpu, --use_logical_core, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9001, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2024-03-04T23:03:59,346 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:03:59,345 - __main__ - WARNING - Core binding with numactl is not available. Disabling numactl and using taskset instead.                     This may affect performance in multi-socket system; please use numactl if memory binding is needed.
2024-03-04T23:03:59,348 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:03:59,347 - __main__ - WARNING - Neither TCMalloc nor JeMalloc is found in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib
2024-03-04T23:03:59,348 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                             or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or
2024-03-04T23:03:59,348 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                            /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set.
2024-03-04T23:03:59,348 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                            This may drop the performance
2024-03-04T23:03:59,348 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:03:59,347 - __main__ - INFO - OMP_NUM_THREADS=192
2024-03-04T23:03:59,348 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:03:59,348 - __main__ - WARNING - Unable to find the iomp library file libiomp5.so in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set. you can use "conda install mkl" to install iomp
2024-03-04T23:03:59,348 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:03:59,348 - __main__ - INFO - taskset -c 0-191 /home/venv/bin/python -u /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py --sock-type unix --sock-name /home/model-server/tmp/.ts.sock.9001 --metrics-config /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2024-03-04T23:04:02,318 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9001, pid=1704
2024-03-04T23:04:02,320 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9001
2024-03-04T23:04:02,326 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2024-03-04T23:04:02,326 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - [PID]1704
2024-03-04T23:04:02,326 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Torch worker started.
2024-03-04T23:04:02,326 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2024-03-04T23:04:02,327 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change WORKER_STOPPED -> WORKER_STARTED
2024-03-04T23:04:02,327 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9001
2024-03-04T23:04:02,328 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1709593442328
2024-03-04T23:04:02,329 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1709593442329
2024-03-04T23:04:02,329 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9001.
2024-03-04T23:04:02,331 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - model_name: bert, batchSize: 2
2024-03-04T23:04:02,336 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Backend worker process died.
2024-03-04T23:04:02,337 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-03-04T23:04:02,337 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 108, in load
2024-03-04T23:04:02,337 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module, function_name = self._load_handler_file(handler)
2024-03-04T23:04:02,337 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 153, in _load_handler_file
2024-03-04T23:04:02,337 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name)
2024-03-04T23:04:02,337 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
2024-03-04T23:04:02,337 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     return _bootstrap._gcd_import(name[level:], package, level)
2024-03-04T23:04:02,337 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
2024-03-04T23:04:02,337 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
2024-03-04T23:04:02,337 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
2024-03-04T23:04:02,337 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
2024-03-04T23:04:02,337 [INFO ] epollEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - 9001 Worker disconnected. WORKER_STARTED
2024-03-04T23:04:02,337 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap_external>", line 850, in exec_module
2024-03-04T23:04:02,337 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
2024-03-04T23:04:02,337 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/model-server/tmp/models/1c029c68cc9d45bd9ed5cce3528928b4/Transformer_kserve_handler.py", line 3, in <module>
2024-03-04T23:04:02,337 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     from Transformer_handler_generalized import (
2024-03-04T23:04:02,337 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/model-server/tmp/models/1c029c68cc9d45bd9ed5cce3528928b4/Transformer_handler_generalized.py", line 8, in <module>
2024-03-04T23:04:02,337 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     import transformers
2024-03-04T23:04:02,337 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'transformers'
2024-03-04T23:04:02,337 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - 
2024-03-04T23:04:02,338 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - During handling of the above exception, another exception occurred:
2024-03-04T23:04:02,337 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2024-03-04T23:04:02,338 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - 
2024-03-04T23:04:02,338 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died., responseTimeout:60sec
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679) ~[?:?]
        at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435) ~[?:?]
        at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:229) [model-server.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:840) [?:?]
2024-03-04T23:04:02,338 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-03-04T23:04:02,338 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 263, in <module>
2024-03-04T23:04:02,339 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     worker.run_server()
2024-03-04T23:04:02,339 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 231, in run_server
2024-03-04T23:04:02,339 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     self.handle_connection(cl_socket)
2024-03-04T23:04:02,339 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: bert, error: Worker died.
2024-03-04T23:04:02,339 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 194, in handle_connection
2024-03-04T23:04:02,339 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     service, result, code = self.load_model(msg)
2024-03-04T23:04:02,339 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 131, in load_model
2024-03-04T23:04:02,339 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2024-03-04T23:04:02,339 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     service = model_loader.load(
2024-03-04T23:04:02,339 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 110, in load
2024-03-04T23:04:02,339 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module = self._load_default_handler(handler)
2024-03-04T23:04:02,339 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-03-04T23:04:02,339 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 159, in _load_default_handler
2024-03-04T23:04:02,339 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name, "ts.torch_handler")
2024-03-04T23:04:02,339 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
2024-03-04T23:04:02,340 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     return _bootstrap._gcd_import(name[level:], package, level)
2024-03-04T23:04:02,340 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
2024-03-04T23:04:02,340 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9001-bert_1.0-stderr
2024-03-04T23:04:02,340 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
2024-03-04T23:04:02,340 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9001-bert_1.0-stdout
2024-03-04T23:04:02,340 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
2024-03-04T23:04:02,340 [INFO ] W-9001-bert_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9001-bert_1.0-stdout
2024-03-04T23:04:02,340 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9001 in 13 seconds.
2024-03-04T23:04:03,110 [INFO ] W-9001-bert_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9001-bert_1.0-stderr
2024-03-04T23:04:15,341 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - launcherAvailable cmdline: [python, -m, torch.backends.xeon.run_cpu, --use_logical_core, --no_python, hostname]
2024-03-04T23:04:18,743 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, -m, torch.backends.xeon.run_cpu, --use_logical_core, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9001, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2024-03-04T23:04:21,645 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:04:21,644 - __main__ - WARNING - Core binding with numactl is not available. Disabling numactl and using taskset instead.                     This may affect performance in multi-socket system; please use numactl if memory binding is needed.
2024-03-04T23:04:21,647 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:04:21,645 - __main__ - WARNING - Neither TCMalloc nor JeMalloc is found in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib
2024-03-04T23:04:21,647 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                             or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or
2024-03-04T23:04:21,647 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                            /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set.
2024-03-04T23:04:21,647 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                            This may drop the performance
2024-03-04T23:04:21,647 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:04:21,645 - __main__ - INFO - OMP_NUM_THREADS=192
2024-03-04T23:04:21,647 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:04:21,645 - __main__ - WARNING - Unable to find the iomp library file libiomp5.so in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set. you can use "conda install mkl" to install iomp
2024-03-04T23:04:21,648 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:04:21,645 - __main__ - INFO - taskset -c 0-191 /home/venv/bin/python -u /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py --sock-type unix --sock-name /home/model-server/tmp/.ts.sock.9001 --metrics-config /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2024-03-04T23:04:23,609 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9001, pid=1908
2024-03-04T23:04:23,611 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9001
2024-03-04T23:04:23,617 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2024-03-04T23:04:23,617 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - [PID]1908
2024-03-04T23:04:23,617 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Torch worker started.
2024-03-04T23:04:23,617 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2024-03-04T23:04:23,618 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change WORKER_STOPPED -> WORKER_STARTED
2024-03-04T23:04:23,618 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9001
2024-03-04T23:04:23,620 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1709593463620
2024-03-04T23:04:23,620 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9001.
2024-03-04T23:04:23,621 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1709593463621
2024-03-04T23:04:23,622 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - model_name: bert, batchSize: 2
2024-03-04T23:04:23,625 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Backend worker process died.
2024-03-04T23:04:23,625 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9001 Worker disconnected. WORKER_STARTED
2024-03-04T23:04:23,625 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-03-04T23:04:23,625 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 108, in load
2024-03-04T23:04:23,626 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module, function_name = self._load_handler_file(handler)
2024-03-04T23:04:23,626 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 153, in _load_handler_file
2024-03-04T23:04:23,626 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name)
2024-03-04T23:04:23,626 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2024-03-04T23:04:23,626 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
2024-03-04T23:04:23,626 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     return _bootstrap._gcd_import(name[level:], package, level)
2024-03-04T23:04:23,626 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
2024-03-04T23:04:23,626 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
2024-03-04T23:04:23,626 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died., responseTimeout:60sec
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679) ~[?:?]
        at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435) ~[?:?]
        at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:229) [model-server.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:840) [?:?]
2024-03-04T23:04:23,626 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: bert, error: Worker died.
2024-03-04T23:04:23,626 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
2024-03-04T23:04:23,626 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2024-03-04T23:04:23,627 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-03-04T23:04:23,627 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
2024-03-04T23:04:23,627 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap_external>", line 850, in exec_module
2024-03-04T23:04:23,627 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
2024-03-04T23:04:23,627 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9001-bert_1.0-stderr
2024-03-04T23:04:23,627 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9001-bert_1.0-stdout
2024-03-04T23:04:23,627 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/model-server/tmp/models/1c029c68cc9d45bd9ed5cce3528928b4/Transformer_kserve_handler.py", line 3, in <module>
2024-03-04T23:04:23,627 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9001 in 21 seconds.
2024-03-04T23:04:23,627 [INFO ] W-9001-bert_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9001-bert_1.0-stdout
2024-03-04T23:04:24,024 [INFO ] W-9001-bert_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9001-bert_1.0-stderr
2024-03-04T23:04:44,628 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - launcherAvailable cmdline: [python, -m, torch.backends.xeon.run_cpu, --use_logical_core, --no_python, hostname]
2024-03-04T23:04:48,025 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, -m, torch.backends.xeon.run_cpu, --use_logical_core, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9001, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2024-03-04T23:04:52,494 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:04:52,493 - __main__ - WARNING - Core binding with numactl is not available. Disabling numactl and using taskset instead.                     This may affect performance in multi-socket system; please use numactl if memory binding is needed.
2024-03-04T23:04:52,495 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:04:52,493 - __main__ - WARNING - Neither TCMalloc nor JeMalloc is found in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib
2024-03-04T23:04:52,495 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                             or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or
2024-03-04T23:04:52,496 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                            /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set.
2024-03-04T23:04:52,496 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                            This may drop the performance
2024-03-04T23:04:52,496 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:04:52,493 - __main__ - INFO - OMP_NUM_THREADS=192
2024-03-04T23:04:52,496 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:04:52,494 - __main__ - WARNING - Unable to find the iomp library file libiomp5.so in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set. you can use "conda install mkl" to install iomp
2024-03-04T23:04:52,496 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:04:52,494 - __main__ - INFO - taskset -c 0-191 /home/venv/bin/python -u /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py --sock-type unix --sock-name /home/model-server/tmp/.ts.sock.9001 --metrics-config /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2024-03-04T23:04:54,447 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9001, pid=2112
2024-03-04T23:04:54,449 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9001
2024-03-04T23:04:54,455 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2024-03-04T23:04:54,455 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - [PID]2112
2024-03-04T23:04:54,455 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Torch worker started.
2024-03-04T23:04:54,456 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2024-03-04T23:04:54,456 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change WORKER_STOPPED -> WORKER_STARTED
2024-03-04T23:04:54,456 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9001
2024-03-04T23:04:54,458 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1709593494458
2024-03-04T23:04:54,458 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9001.
2024-03-04T23:04:54,459 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1709593494459
2024-03-04T23:04:54,460 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - model_name: bert, batchSize: 2
2024-03-04T23:04:54,463 [INFO ] epollEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - 9001 Worker disconnected. WORKER_STARTED
2024-03-04T23:04:54,473 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2024-03-04T23:04:54,473 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died., responseTimeout:60sec
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679) ~[?:?]
        at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435) ~[?:?]
        at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:229) [model-server.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:840) [?:?]
2024-03-04T23:04:54,474 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Backend worker process died.
2024-03-04T23:04:54,474 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: bert, error: Worker died.
2024-03-04T23:04:54,474 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-03-04T23:04:54,474 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2024-03-04T23:04:54,474 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-03-04T23:04:54,474 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 108, in load
2024-03-04T23:04:54,474 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module, function_name = self._load_handler_file(handler)
2024-03-04T23:04:54,474 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9001-bert_1.0-stderr
2024-03-04T23:04:54,474 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 153, in _load_handler_file
2024-03-04T23:04:54,474 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9001-bert_1.0-stdout
2024-03-04T23:04:54,474 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name)
2024-03-04T23:04:54,475 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9001 in 34 seconds.
2024-03-04T23:04:54,475 [INFO ] W-9001-bert_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9001-bert_1.0-stdout
2024-03-04T23:04:54,890 [INFO ] W-9001-bert_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9001-bert_1.0-stderr
2024-03-04T23:05:28,476 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - launcherAvailable cmdline: [python, -m, torch.backends.xeon.run_cpu, --use_logical_core, --no_python, hostname]
2024-03-04T23:05:33,019 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, -m, torch.backends.xeon.run_cpu, --use_logical_core, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9001, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2024-03-04T23:05:34,932 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:05:34,930 - __main__ - WARNING - Core binding with numactl is not available. Disabling numactl and using taskset instead.                     This may affect performance in multi-socket system; please use numactl if memory binding is needed.
2024-03-04T23:05:34,933 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:05:34,932 - __main__ - WARNING - Neither TCMalloc nor JeMalloc is found in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib
2024-03-04T23:05:34,933 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                             or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or
2024-03-04T23:05:34,933 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                            /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set.
2024-03-04T23:05:34,933 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                            This may drop the performance
2024-03-04T23:05:34,933 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:05:34,933 - __main__ - INFO - OMP_NUM_THREADS=192
2024-03-04T23:05:34,933 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:05:34,933 - __main__ - WARNING - Unable to find the iomp library file libiomp5.so in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set. you can use "conda install mkl" to install iomp
2024-03-04T23:05:34,933 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:05:34,933 - __main__ - INFO - taskset -c 0-191 /home/venv/bin/python -u /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py --sock-type unix --sock-name /home/model-server/tmp/.ts.sock.9001 --metrics-config /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2024-03-04T23:05:37,995 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9001, pid=2316
2024-03-04T23:05:37,996 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9001
2024-03-04T23:05:38,002 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2024-03-04T23:05:38,002 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - [PID]2316
2024-03-04T23:05:38,003 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Torch worker started.
2024-03-04T23:05:38,003 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2024-03-04T23:05:38,003 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change WORKER_STOPPED -> WORKER_STARTED
2024-03-04T23:05:38,003 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9001
2024-03-04T23:05:38,005 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1709593538005
2024-03-04T23:05:38,006 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9001.
2024-03-04T23:05:38,006 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1709593538006
2024-03-04T23:05:38,007 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - model_name: bert, batchSize: 2
2024-03-04T23:05:38,013 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Backend worker process died.
2024-03-04T23:05:38,013 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-03-04T23:05:38,013 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 108, in load
2024-03-04T23:05:38,013 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module, function_name = self._load_handler_file(handler)
2024-03-04T23:05:38,013 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 153, in _load_handler_file
2024-03-04T23:05:38,013 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name)
2024-03-04T23:05:38,013 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
2024-03-04T23:05:38,013 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     return _bootstrap._gcd_import(name[level:], package, level)
2024-03-04T23:05:38,013 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
2024-03-04T23:05:38,013 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
2024-03-04T23:05:38,013 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
2024-03-04T23:05:38,013 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
2024-03-04T23:05:38,013 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap_external>", line 850, in exec_module
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/model-server/tmp/models/1c029c68cc9d45bd9ed5cce3528928b4/Transformer_kserve_handler.py", line 3, in <module>
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     from Transformer_handler_generalized import (
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/model-server/tmp/models/1c029c68cc9d45bd9ed5cce3528928b4/Transformer_handler_generalized.py", line 8, in <module>
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     import transformers
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'transformers'
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - 
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - During handling of the above exception, another exception occurred:
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - 
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 263, in <module>
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     worker.run_server()
2024-03-04T23:05:38,014 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9001 Worker disconnected. WORKER_STARTED
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 231, in run_server
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     self.handle_connection(cl_socket)
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 194, in handle_connection
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     service, result, code = self.load_model(msg)
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 131, in load_model
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     service = model_loader.load(
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 110, in load
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module = self._load_default_handler(handler)
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 159, in _load_default_handler
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name, "ts.torch_handler")
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     return _bootstrap._gcd_import(name[level:], package, level)
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
2024-03-04T23:05:38,014 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
2024-03-04T23:05:38,014 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2024-03-04T23:05:38,015 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
2024-03-04T23:05:38,015 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
2024-03-04T23:05:38,015 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
2024-03-04T23:05:38,015 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'ts.torch_handler.Transformer_kserve_handler'
2024-03-04T23:05:38,015 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died., responseTimeout:60sec
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679) ~[?:?]
        at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435) ~[?:?]
        at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:229) [model-server.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:840) [?:?]
2024-03-04T23:05:38,015 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: bert, error: Worker died.
2024-03-04T23:05:38,015 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2024-03-04T23:05:38,015 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-03-04T23:05:38,015 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9001-bert_1.0-stderr
2024-03-04T23:05:38,015 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9001-bert_1.0-stdout
2024-03-04T23:05:38,015 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9001 in 55 seconds.
2024-03-04T23:05:38,804 [INFO ] W-9001-bert_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9001-bert_1.0-stderr
2024-03-04T23:05:38,804 [INFO ] W-9001-bert_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9001-bert_1.0-stdout
2024-03-04T23:06:33,017 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - launcherAvailable cmdline: [python, -m, torch.backends.xeon.run_cpu, --use_logical_core, --no_python, hostname]
2024-03-04T23:06:37,321 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, -m, torch.backends.xeon.run_cpu, --use_logical_core, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9001, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2024-03-04T23:06:39,693 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:06:39,692 - __main__ - WARNING - Core binding with numactl is not available. Disabling numactl and using taskset instead.                     This may affect performance in multi-socket system; please use numactl if memory binding is needed.
2024-03-04T23:06:39,693 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:06:39,692 - __main__ - WARNING - Neither TCMalloc nor JeMalloc is found in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib
2024-03-04T23:06:39,693 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                             or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or
2024-03-04T23:06:39,693 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                            /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set.
2024-03-04T23:06:39,693 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                            This may drop the performance
2024-03-04T23:06:39,693 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:06:39,693 - __main__ - INFO - OMP_NUM_THREADS=192
2024-03-04T23:06:39,694 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:06:39,693 - __main__ - WARNING - Unable to find the iomp library file libiomp5.so in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set. you can use "conda install mkl" to install iomp
2024-03-04T23:06:39,694 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:06:39,693 - __main__ - INFO - taskset -c 0-191 /home/venv/bin/python -u /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py --sock-type unix --sock-name /home/model-server/tmp/.ts.sock.9001 --metrics-config /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2024-03-04T23:06:41,544 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9001, pid=2520
2024-03-04T23:06:41,546 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9001
2024-03-04T23:06:41,552 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2024-03-04T23:06:41,552 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - [PID]2520
2024-03-04T23:06:41,552 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Torch worker started.
2024-03-04T23:06:41,552 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2024-03-04T23:06:41,553 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change WORKER_STOPPED -> WORKER_STARTED
2024-03-04T23:06:41,553 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9001
2024-03-04T23:06:41,555 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9001.
2024-03-04T23:06:41,555 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1709593601555
2024-03-04T23:06:41,555 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1709593601555
2024-03-04T23:06:41,557 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - model_name: bert, batchSize: 2
2024-03-04T23:06:41,560 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Backend worker process died.
2024-03-04T23:06:41,560 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-03-04T23:06:41,560 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 108, in load
2024-03-04T23:06:41,560 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module, function_name = self._load_handler_file(handler)
2024-03-04T23:06:41,560 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 153, in _load_handler_file
2024-03-04T23:06:41,560 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name)
2024-03-04T23:06:41,560 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
2024-03-04T23:06:41,560 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     return _bootstrap._gcd_import(name[level:], package, level)
2024-03-04T23:06:41,560 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
2024-03-04T23:06:41,560 [INFO ] epollEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - 9001 Worker disconnected. WORKER_STARTED
2024-03-04T23:06:41,560 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
2024-03-04T23:06:41,560 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
2024-03-04T23:06:41,560 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
2024-03-04T23:06:41,560 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap_external>", line 850, in exec_module
2024-03-04T23:06:41,560 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
2024-03-04T23:06:41,560 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/model-server/tmp/models/1c029c68cc9d45bd9ed5cce3528928b4/Transformer_kserve_handler.py", line 3, in <module>
2024-03-04T23:06:41,560 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     from Transformer_handler_generalized import (
2024-03-04T23:06:41,560 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/model-server/tmp/models/1c029c68cc9d45bd9ed5cce3528928b4/Transformer_handler_generalized.py", line 8, in <module>
2024-03-04T23:06:41,560 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     import transformers
2024-03-04T23:06:41,560 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'transformers'
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - 
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - During handling of the above exception, another exception occurred:
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - 
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 263, in <module>
2024-03-04T23:06:41,561 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     worker.run_server()
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 231, in run_server
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     self.handle_connection(cl_socket)
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 194, in handle_connection
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     service, result, code = self.load_model(msg)
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 131, in load_model
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     service = model_loader.load(
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 110, in load
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module = self._load_default_handler(handler)
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 159, in _load_default_handler
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name, "ts.torch_handler")
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     return _bootstrap._gcd_import(name[level:], package, level)
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
2024-03-04T23:06:41,561 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'ts.torch_handler.Transformer_kserve_handler'
2024-03-04T23:06:41,573 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died., responseTimeout:60sec
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679) ~[?:?]
        at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435) ~[?:?]
        at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:229) [model-server.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:840) [?:?]
2024-03-04T23:06:41,574 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: bert, error: Worker died.
2024-03-04T23:06:41,574 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2024-03-04T23:06:41,574 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-03-04T23:06:41,574 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9001-bert_1.0-stderr
2024-03-04T23:06:41,574 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9001-bert_1.0-stdout
2024-03-04T23:06:41,574 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9001 in 89 seconds.
2024-03-04T23:06:41,933 [INFO ] W-9001-bert_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9001-bert_1.0-stderr
2024-03-04T23:06:41,933 [INFO ] W-9001-bert_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9001-bert_1.0-stdout
2024-03-04T23:08:10,576 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - launcherAvailable cmdline: [python, -m, torch.backends.xeon.run_cpu, --use_logical_core, --no_python, hostname]
2024-03-04T23:08:15,996 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, -m, torch.backends.xeon.run_cpu, --use_logical_core, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9001, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2024-03-04T23:08:20,634 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:08:20,633 - __main__ - WARNING - Core binding with numactl is not available. Disabling numactl and using taskset instead.                     This may affect performance in multi-socket system; please use numactl if memory binding is needed.
2024-03-04T23:08:20,636 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:08:20,634 - __main__ - WARNING - Neither TCMalloc nor JeMalloc is found in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib
2024-03-04T23:08:20,636 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                             or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or
2024-03-04T23:08:20,636 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                            /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set.
2024-03-04T23:08:20,636 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                            This may drop the performance
2024-03-04T23:08:20,636 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:08:20,635 - __main__ - INFO - OMP_NUM_THREADS=192
2024-03-04T23:08:20,636 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:08:20,635 - __main__ - WARNING - Unable to find the iomp library file libiomp5.so in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set. you can use "conda install mkl" to install iomp
2024-03-04T23:08:20,637 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:08:20,635 - __main__ - INFO - taskset -c 0-191 /home/venv/bin/python -u /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py --sock-type unix --sock-name /home/model-server/tmp/.ts.sock.9001 --metrics-config /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2024-03-04T23:08:23,604 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9001, pid=2726
2024-03-04T23:08:23,606 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9001
2024-03-04T23:08:23,612 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2024-03-04T23:08:23,612 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - [PID]2726
2024-03-04T23:08:23,612 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Torch worker started.
2024-03-04T23:08:23,612 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2024-03-04T23:08:23,613 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change WORKER_STOPPED -> WORKER_STARTED
2024-03-04T23:08:23,613 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9001
2024-03-04T23:08:23,615 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1709593703615
2024-03-04T23:08:23,616 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9001.
2024-03-04T23:08:23,616 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1709593703616
2024-03-04T23:08:23,619 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - model_name: bert, batchSize: 2
2024-03-04T23:08:23,624 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Backend worker process died.
2024-03-04T23:08:23,624 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-03-04T23:08:23,624 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 108, in load
2024-03-04T23:08:23,624 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module, function_name = self._load_handler_file(handler)
2024-03-04T23:08:23,624 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 153, in _load_handler_file
2024-03-04T23:08:23,624 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name)
2024-03-04T23:08:23,624 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
2024-03-04T23:08:23,624 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9001 Worker disconnected. WORKER_STARTED
2024-03-04T23:08:23,625 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     return _bootstrap._gcd_import(name[level:], package, level)
2024-03-04T23:08:23,625 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
2024-03-04T23:08:23,625 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
2024-03-04T23:08:23,625 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2024-03-04T23:08:23,625 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
2024-03-04T23:08:23,625 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
2024-03-04T23:08:23,625 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap_external>", line 850, in exec_module
2024-03-04T23:08:23,625 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
2024-03-04T23:08:23,625 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/model-server/tmp/models/1c029c68cc9d45bd9ed5cce3528928b4/Transformer_kserve_handler.py", line 3, in <module>
2024-03-04T23:08:23,625 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     from Transformer_handler_generalized import (
2024-03-04T23:08:23,625 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died., responseTimeout:60sec
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679) ~[?:?]
        at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435) ~[?:?]
        at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:229) [model-server.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:840) [?:?]
2024-03-04T23:08:23,625 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/model-server/tmp/models/1c029c68cc9d45bd9ed5cce3528928b4/Transformer_handler_generalized.py", line 8, in <module>
2024-03-04T23:08:23,626 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     import transformers
2024-03-04T23:08:23,626 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: bert, error: Worker died.
2024-03-04T23:08:23,626 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'transformers'
2024-03-04T23:08:23,626 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2024-03-04T23:08:23,626 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - 
2024-03-04T23:08:23,626 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-03-04T23:08:23,626 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - During handling of the above exception, another exception occurred:
2024-03-04T23:08:23,626 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - 
2024-03-04T23:08:23,626 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-03-04T23:08:23,626 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 263, in <module>
2024-03-04T23:08:23,626 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9001-bert_1.0-stderr
2024-03-04T23:08:23,626 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -     worker.run_server()
2024-03-04T23:08:23,626 [WARN ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9001-bert_1.0-stdout
2024-03-04T23:08:23,626 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 231, in run_server
2024-03-04T23:08:23,626 [INFO ] W-9001-bert_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9001-bert_1.0-stdout
2024-03-04T23:08:24,403 [INFO ] W-9001-bert_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9001-bert_1.0-stderr
tylertitsworth commented 8 months ago

@agunapal Can you kindly point me to the nightly kserve image? I'm having trouble finding it.

agunapal commented 8 months ago

@tylertitsworth Sorry here it is https://hub.docker.com/r/pytorch/torchserve-kfs-nightly/tags

tylertitsworth commented 8 months ago

Using pytorch/torchserve-kfs-nightly:latest-cpu: ts_log.log

2024-03-04T23:34:57,134 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2024-03-04T23:34:57,134 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2024-03-04T23:34:57,242 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2024-03-04T23:34:57,242 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2024-03-04T23:34:57,427 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.9.0
TS Home: /home/venv/lib/python3.9/site-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Metrics config path: /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
Number of GPUs: 0
Number of CPUs: 1
Max heap size: 494 M
Python executable: /home/venv/bin/python
Config file: /mnt/models/config/config.properties
Inference address: http://0.0.0.0:8085
Management address: http://0.0.0.0:8085
Metrics address: http://0.0.0.0:8082
Model Store: /mnt/pvc/model-store
Initial Models: N/A
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 4
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [https://s3.amazonaws.com/.*, https://torchserve.pytorch.org/.*]
Custom python dependency for model allowed: true
Enable metrics API: true
Metrics mode: PROMETHEUS
Disable system metrics: false
Workflow Store: /mnt/pvc/model-store
CPP log config: N/A
Model config: N/A
2024-03-04T23:34:57,427 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.9.0
TS Home: /home/venv/lib/python3.9/site-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Metrics config path: /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
Number of GPUs: 0
Number of CPUs: 1
Max heap size: 494 M
Python executable: /home/venv/bin/python
Config file: /mnt/models/config/config.properties
Inference address: http://0.0.0.0:8085
Management address: http://0.0.0.0:8085
Metrics address: http://0.0.0.0:8082
Model Store: /mnt/pvc/model-store
Initial Models: N/A
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 4
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [https://s3.amazonaws.com/.*, https://torchserve.pytorch.org/.*]
Custom python dependency for model allowed: true
Enable metrics API: true
Metrics mode: PROMETHEUS
Disable system metrics: false
Workflow Store: /mnt/pvc/model-store
CPP log config: N/A
Model config: N/A
2024-03-04T23:34:57,433 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2024-03-04T23:34:57,433 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2024-03-04T23:34:57,448 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Started restoring models from snapshot {"name":"startup.cfg","modelCount":2,"models":{"mnist":{"1.0":{"defaultVersion":true,"marName":"mnist.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}},"bert":{"1.0":{"defaultVersion":true,"marName":"bert.mar","minWorkers":1,"maxWorkers":5,"batchSize":2,"maxBatchDelay":500,"responseTimeout":60}}}}
2024-03-04T23:34:57,448 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Started restoring models from snapshot {"name":"startup.cfg","modelCount":2,"models":{"mnist":{"1.0":{"defaultVersion":true,"marName":"mnist.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}},"bert":{"1.0":{"defaultVersion":true,"marName":"bert.mar","minWorkers":1,"maxWorkers":5,"batchSize":2,"maxBatchDelay":500,"responseTimeout":60}}}}
2024-03-04T23:34:57,453 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Validating snapshot startup.cfg
2024-03-04T23:34:57,453 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Validating snapshot startup.cfg
2024-03-04T23:34:57,457 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Snapshot startup.cfg validated successfully
2024-03-04T23:34:57,457 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Snapshot startup.cfg validated successfully
2024-03-04T23:34:57,828 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model mnist
2024-03-04T23:34:57,828 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model mnist
2024-03-04T23:34:57,828 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model mnist
2024-03-04T23:34:57,828 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model mnist
2024-03-04T23:34:57,828 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model mnist
2024-03-04T23:34:57,828 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model mnist
2024-03-04T23:34:57,828 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model mnist loaded.
2024-03-04T23:34:57,828 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model mnist loaded.
2024-03-04T23:34:57,829 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: mnist, count: 1
2024-03-04T23:34:57,829 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: mnist, count: 1
2024-03-04T23:34:57,836 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - launcherAvailable cmdline: [python, -m, torch.backends.xeon.run_cpu, --use_logical_core, --no_python, hostname]
2024-03-04T23:34:57,836 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - launcherAvailable cmdline: [python, -m, torch.backends.xeon.run_cpu, --use_logical_core, --no_python, hostname]
2024-03-04T23:35:01,166 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, -m, torch.backends.xeon.run_cpu, --use_logical_core, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2024-03-04T23:35:01,166 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, -m, torch.backends.xeon.run_cpu, --use_logical_core, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2024-03-04T23:35:03,135 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG - 2024-03-04 23:35:03,133 - __main__ - WARNING - Core binding with numactl is not available. Disabling numactl and using taskset instead.                     This may affect performance in multi-socket system; please use numactl if memory binding is needed.
2024-03-04T23:35:03,137 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG - 2024-03-04 23:35:03,134 - __main__ - WARNING - Neither TCMalloc nor JeMalloc is found in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib
2024-03-04T23:35:03,137 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG -                             or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or
2024-03-04T23:35:03,137 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG -                            /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set.
2024-03-04T23:35:03,137 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG -                            This may drop the performance
2024-03-04T23:35:03,137 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG - 2024-03-04 23:35:03,134 - __main__ - INFO - OMP_NUM_THREADS=112
2024-03-04T23:35:03,138 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG - 2024-03-04 23:35:03,134 - __main__ - WARNING - Unable to find the iomp library file libiomp5.so in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set. you can use "conda install mkl" to install iomp
2024-03-04T23:35:03,138 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG - 2024-03-04 23:35:03,134 - __main__ - INFO - taskset -c 0-111 /home/venv/bin/python -u /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py --sock-type unix --sock-name /home/model-server/tmp/.ts.sock.9000 --metrics-config /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2024-03-04T23:35:05,043 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9000, pid=248
2024-03-04T23:35:05,044 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2024-03-04T23:35:05,050 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2024-03-04T23:35:05,051 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - [PID]248
2024-03-04T23:35:05,051 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Torch worker started.
2024-03-04T23:35:05,051 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2024-03-04T23:35:05,052 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change null -> WORKER_STARTED
2024-03-04T23:35:05,052 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change null -> WORKER_STARTED
2024-03-04T23:35:05,068 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2024-03-04T23:35:05,068 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2024-03-04T23:35:05,154 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2024-03-04T23:35:05,156 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1709595305156
2024-03-04T23:35:05,156 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1709595305156
2024-03-04T23:35:05,158 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1709595305158
2024-03-04T23:35:05,158 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1709595305158
2024-03-04T23:35:05,250 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - model_name: mnist, batchSize: 1
2024-03-04T23:35:07,139 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - generated new fontManager
2024-03-04T23:35:07,538 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - IPEX is enabled but intel-extension-for-pytorch is not installed. Proceeding without IPEX.
2024-03-04T23:35:07,538 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2024-03-04T23:35:07,538 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2024-03-04T23:35:07,826 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - '/home/model-server/tmp/models/ab8cebf1b583467f8d7a94c7fc723ef2/index_to_name.json' is missing. Inference output will not include class name.
2024-03-04T23:35:07,931 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 2773
2024-03-04T23:35:07,931 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 2773
2024-03-04T23:35:07,931 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-03-04T23:35:07,931 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-03-04T23:35:10,045 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model bert
2024-03-04T23:35:10,045 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model bert
2024-03-04T23:35:10,046 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model bert
2024-03-04T23:35:10,046 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model bert
2024-03-04T23:35:10,046 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model bert
2024-03-04T23:35:10,046 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model bert
2024-03-04T23:35:10,046 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model bert loaded.
2024-03-04T23:35:10,046 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model bert loaded.
2024-03-04T23:35:10,046 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: bert, count: 1
2024-03-04T23:35:10,046 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: bert, count: 1
2024-03-04T23:35:10,047 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - launcherAvailable cmdline: [python, -m, torch.backends.xeon.run_cpu, --use_logical_core, --no_python, hostname]
2024-03-04T23:35:10,047 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - launcherAvailable cmdline: [python, -m, torch.backends.xeon.run_cpu, --use_logical_core, --no_python, hostname]
2024-03-04T23:35:10,048 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2024-03-04T23:35:10,048 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2024-03-04T23:35:10,057 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8085
2024-03-04T23:35:10,057 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8085
2024-03-04T23:35:10,057 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2024-03-04T23:35:10,057 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2024-03-04T23:35:10,058 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082
2024-03-04T23:35:10,058 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082
2024-03-04T23:35:10,465 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2024-03-04T23:35:10,465 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2024-03-04T23:35:11,978 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, -m, torch.backends.xeon.run_cpu, --use_logical_core, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9001, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2024-03-04T23:35:11,978 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, -m, torch.backends.xeon.run_cpu, --use_logical_core, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9001, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2024-03-04T23:35:13,140 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:35:13,139 - __main__ - WARNING - Core binding with numactl is not available. Disabling numactl and using taskset instead.                     This may affect performance in multi-socket system; please use numactl if memory binding is needed.
2024-03-04T23:35:13,141 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:35:13,140 - __main__ - WARNING - Neither TCMalloc nor JeMalloc is found in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib
2024-03-04T23:35:13,141 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                             or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or
2024-03-04T23:35:13,142 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                            /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set.
2024-03-04T23:35:13,142 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                            This may drop the performance
2024-03-04T23:35:13,142 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:35:13,140 - __main__ - INFO - OMP_NUM_THREADS=112
2024-03-04T23:35:13,142 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:35:13,140 - __main__ - WARNING - Unable to find the iomp library file libiomp5.so in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set. you can use "conda install mkl" to install iomp
2024-03-04T23:35:13,142 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:35:13,140 - __main__ - INFO - taskset -c 0-111 /home/venv/bin/python -u /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py --sock-type unix --sock-name /home/model-server/tmp/.ts.sock.9001 --metrics-config /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2024-03-04T23:35:14,413 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9001, pid=521
2024-03-04T23:35:14,414 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9001
2024-03-04T23:35:14,420 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2024-03-04T23:35:14,420 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - [PID]521
2024-03-04T23:35:14,420 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Torch worker started.
2024-03-04T23:35:14,420 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2024-03-04T23:35:14,421 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change null -> WORKER_STARTED
2024-03-04T23:35:14,421 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change null -> WORKER_STARTED
2024-03-04T23:35:14,421 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9001
2024-03-04T23:35:14,421 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9001
2024-03-04T23:35:14,427 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1709595314427
2024-03-04T23:35:14,427 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1709595314427
2024-03-04T23:35:14,427 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9001.
2024-03-04T23:35:14,427 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1709595314427
2024-03-04T23:35:14,427 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1709595314427
2024-03-04T23:35:14,454 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - model_name: bert, batchSize: 2
2024-03-04T23:35:15,086 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - IPEX is enabled but intel-extension-for-pytorch is not installed. Proceeding without IPEX.
2024-03-04T23:35:15,086 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2024-03-04T23:35:15,086 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2024-03-04T23:35:15,086 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Transformers version 4.38.2
2024-03-04T23:35:22,458 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Transformer model from path /home/model-server/tmp/models/333a80a36c124e17980757215d76533d loaded successfully
2024-03-04T23:35:22,459 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 8032
2024-03-04T23:35:22,459 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 8032
2024-03-04T23:35:22,459 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-03-04T23:35:22,459 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED

Server Log:

2024-03-04T23:34:57,433 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2024-03-04T23:34:57,448 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Started restoring models from snapshot {"name":"startup.cfg","modelCount":2,"models":{"mnist":{"1.0":{"defaultVersion":true,"marName":"mnist.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}},"bert":{"1.0":{"defaultVersion":true,"marName":"bert.mar","minWorkers":1,"maxWorkers":5,"batchSize":2,"maxBatchDelay":500,"responseTimeout":60}}}}
2024-03-04T23:34:57,453 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Validating snapshot startup.cfg
2024-03-04T23:34:57,457 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Snapshot startup.cfg validated successfully
2024-03-04T23:34:57,828 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model mnist
2024-03-04T23:34:57,828 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model mnist
2024-03-04T23:34:57,828 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model mnist
2024-03-04T23:34:57,828 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model mnist loaded.
2024-03-04T23:34:57,829 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: mnist, count: 1
2024-03-04T23:34:57,836 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - launcherAvailable cmdline: [python, -m, torch.backends.xeon.run_cpu, --use_logical_core, --no_python, hostname]
INFO:root:Wrapper: loading configuration from /mnt/models/config/config.properties
INFO:root:Wrapper : Model names ['mnist', 'bert'], inference address http://0.0.0.0:8085, management address http://0.0.0.0:8085, grpc_inference_address, 0.0.0.0:7070, model store /mnt/models/model-store
INFO:root:Predict URL set to 0.0.0.0:8085
INFO:root:Explain URL set to 0.0.0.0:8085
INFO:root:Protocol version is v2
INFO:root:Copying contents of /mnt/models/model-store to local
INFO:root:Loading mnist .. 1 of 10 tries..
INFO:root:Expecting value: line 1 column 1 (char 0)
INFO:root:Failed loading model mnist
INFO:root:Predict URL set to 0.0.0.0:8085
INFO:root:Explain URL set to 0.0.0.0:8085
INFO:root:Protocol version is v2
INFO:root:Copying contents of /mnt/models/model-store to local
INFO:root:Loading bert .. 1 of 10 tries..
INFO:root:Expecting value: line 1 column 1 (char 0)
INFO:root:Failed loading model bert
INFO:root:TSModelRepo is initialized
INFO:kserve:Registering model: mnist
INFO:kserve:Registering model: bert
INFO:kserve:Setting max asyncio worker threads as 32
INFO:kserve:Starting uvicorn with 1 workers
2024-03-04 23:34:59.838 uvicorn.error INFO:     Started server process [10]
2024-03-04 23:34:59.839 uvicorn.error INFO:     Waiting for application startup.
2024-03-04 23:34:59.841 10 kserve INFO [start():62] Starting gRPC server on [::]:8081
2024-03-04 23:34:59.841 uvicorn.error INFO:     Application startup complete.
2024-03-04 23:34:59.842 uvicorn.error INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
2024-03-04T23:35:01,166 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, -m, torch.backends.xeon.run_cpu, --use_logical_core, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2024-03-04T23:35:03,135 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG - 2024-03-04 23:35:03,133 - __main__ - WARNING - Core binding with numactl is not available. Disabling numactl and using taskset instead.                     This may affect performance in multi-socket system; please use numactl if memory binding is needed.
2024-03-04T23:35:03,137 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG - 2024-03-04 23:35:03,134 - __main__ - WARNING - Neither TCMalloc nor JeMalloc is found in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib
2024-03-04T23:35:03,137 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG -                             or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or
2024-03-04T23:35:03,137 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG -                            /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set.
2024-03-04T23:35:03,137 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG -                            This may drop the performance
2024-03-04T23:35:03,137 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG - 2024-03-04 23:35:03,134 - __main__ - INFO - OMP_NUM_THREADS=112
2024-03-04T23:35:03,138 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG - 2024-03-04 23:35:03,134 - __main__ - WARNING - Unable to find the iomp library file libiomp5.so in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set. you can use "conda install mkl" to install iomp
2024-03-04T23:35:03,138 [WARN ] W-9000-mnist_1.0-stderr MODEL_LOG - 2024-03-04 23:35:03,134 - __main__ - INFO - taskset -c 0-111 /home/venv/bin/python -u /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py --sock-type unix --sock-name /home/model-server/tmp/.ts.sock.9000 --metrics-config /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2024-03-04T23:35:05,043 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9000, pid=248
2024-03-04T23:35:05,044 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2024-03-04T23:35:05,050 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2024-03-04T23:35:05,051 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - [PID]248
2024-03-04T23:35:05,051 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Torch worker started.
2024-03-04T23:35:05,051 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2024-03-04T23:35:05,052 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change null -> WORKER_STARTED
2024-03-04T23:35:05,068 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2024-03-04T23:35:05,154 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2024-03-04T23:35:05,156 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1709595305156
2024-03-04T23:35:05,158 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1709595305158
2024-03-04T23:35:05,250 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - model_name: mnist, batchSize: 1
2024-03-04T23:35:07,139 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - generated new fontManager
2024-03-04T23:35:07,538 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - IPEX is enabled but intel-extension-for-pytorch is not installed. Proceeding without IPEX.
2024-03-04T23:35:07,538 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2024-03-04T23:35:07,538 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2024-03-04T23:35:07,826 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - '/home/model-server/tmp/models/ab8cebf1b583467f8d7a94c7fc723ef2/index_to_name.json' is missing. Inference output will not include class name.
2024-03-04T23:35:07,931 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 2773
2024-03-04T23:35:07,931 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-03-04T23:35:10,045 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model bert
2024-03-04T23:35:10,046 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model bert
2024-03-04T23:35:10,046 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model bert
2024-03-04T23:35:10,046 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model bert loaded.
2024-03-04T23:35:10,046 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: bert, count: 1
2024-03-04T23:35:10,047 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - launcherAvailable cmdline: [python, -m, torch.backends.xeon.run_cpu, --use_logical_core, --no_python, hostname]
2024-03-04T23:35:10,048 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2024-03-04T23:35:10,057 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8085
2024-03-04T23:35:10,057 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2024-03-04T23:35:10,058 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082
Model server started.
2024-03-04T23:35:10,465 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2024-03-04T23:35:11,978 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, -m, torch.backends.xeon.run_cpu, --use_logical_core, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9001, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2024-03-04T23:35:13,140 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:35:13,139 - __main__ - WARNING - Core binding with numactl is not available. Disabling numactl and using taskset instead.                     This may affect performance in multi-socket system; please use numactl if memory binding is needed.
2024-03-04T23:35:13,141 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:35:13,140 - __main__ - WARNING - Neither TCMalloc nor JeMalloc is found in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib
2024-03-04T23:35:13,141 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                             or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or
2024-03-04T23:35:13,142 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                            /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set.
2024-03-04T23:35:13,142 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG -                            This may drop the performance
2024-03-04T23:35:13,142 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:35:13,140 - __main__ - INFO - OMP_NUM_THREADS=112
2024-03-04T23:35:13,142 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:35:13,140 - __main__ - WARNING - Unable to find the iomp library file libiomp5.so in $CONDA_PREFIX/lib or $VIRTUAL_ENV/lib or /.local/lib/ or /usr/local/lib/ or /usr/local/lib64/ or /usr/lib or /usr/lib64 or /home/model-server/.local/lib/ so the LD_PRELOAD environment variable will not be set. you can use "conda install mkl" to install iomp
2024-03-04T23:35:13,142 [WARN ] W-9001-bert_1.0-stderr MODEL_LOG - 2024-03-04 23:35:13,140 - __main__ - INFO - taskset -c 0-111 /home/venv/bin/python -u /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py --sock-type unix --sock-name /home/model-server/tmp/.ts.sock.9001 --metrics-config /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2024-03-04T23:35:14,413 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9001, pid=521
2024-03-04T23:35:14,414 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9001
2024-03-04T23:35:14,420 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2024-03-04T23:35:14,420 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - [PID]521
2024-03-04T23:35:14,420 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Torch worker started.
2024-03-04T23:35:14,420 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2024-03-04T23:35:14,421 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change null -> WORKER_STARTED
2024-03-04T23:35:14,421 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9001
2024-03-04T23:35:14,427 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1709595314427
2024-03-04T23:35:14,427 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9001.
2024-03-04T23:35:14,427 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1709595314427
2024-03-04T23:35:14,454 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - model_name: bert, batchSize: 2
2024-03-04T23:35:15,086 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - IPEX is enabled but intel-extension-for-pytorch is not installed. Proceeding without IPEX.
2024-03-04T23:35:15,086 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2024-03-04T23:35:15,086 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2024-03-04T23:35:15,086 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Transformers version 4.38.2
2024-03-04T23:35:22,458 [INFO ] W-9001-bert_1.0-stdout MODEL_LOG - Transformer model from path /home/model-server/tmp/models/333a80a36c124e17980757215d76533d loaded successfully
2024-03-04T23:35:22,459 [INFO ] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 8032
2024-03-04T23:35:22,459 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-03-04 23:35:24.404 uvicorn.access INFO:     100.80.175.253:0 10 - "GET /metrics HTTP/1.1" 200 OK
2024-03-04 23:35:24.404 kserve.trace kserve.io.kserve.protocol.rest.server.metrics_handler: 0.0008211135864257812
2024-03-04 23:35:24.404 kserve.trace kserve.io.kserve.protocol.rest.server.metrics_handler: 0.0008170000000000677
2024-03-04 23:35:41.267 10 kserve ERROR [model_not_ready_handler():118] Exception:
Traceback (most recent call last):
  File "/home/venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/home/venv/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/home/venv/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/home/venv/lib/python3.9/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/home/venv/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/home/venv/lib/python3.9/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/home/venv/lib/python3.9/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/home/venv/lib/python3.9/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
    return await dependant.call(**values)
  File "/home/venv/lib/python3.9/site-packages/kserve/protocol/rest/v2_endpoints.py", line 139, in infer
2024-03-04 23:35:41.268 uvicorn.access INFO:     100.80.96.9:0 10 - "POST /v2/models/mnist/infer HTTP/1.1" 503 Service Unavailable
    raise ModelNotReady(model_name)
kserve.errors.ModelNotReady: Model with name mnist is not ready.
2024-03-04 23:35:41.268 kserve.trace kserve.io.kserve.protocol.rest.v2_endpoints.infer: 0.001990795135498047
2024-03-04 23:35:41.268 kserve.trace kserve.io.kserve.protocol.rest.v2_endpoints.infer: 0.0019879999999998788
2024-03-04 23:35:41.289 uvicorn.access INFO:     100.80.96.9:0 10 - "POST /v2/models/mnist/infer HTTP/1.1" 503 Service Unavailable
2024-03-04 23:35:41.289 10 kserve ERROR [model_not_ready_handler():118] Exception:
Traceback (most recent call last):
  File "/home/venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/home/venv/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/home/venv/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/home/venv/lib/python3.9/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/home/venv/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/home/venv/lib/python3.9/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/home/venv/lib/python3.9/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/home/venv/lib/python3.9/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
    return await dependant.call(**values)
  File "/home/venv/lib/python3.9/site-packages/kserve/protocol/rest/v2_endpoints.py", line 139, in infer
    raise ModelNotReady(model_name)
kserve.errors.ModelNotReady: Model with name mnist is not ready.
2024-03-04 23:35:41.289 kserve.trace kserve.io.kserve.protocol.rest.v2_endpoints.infer: 0.0008304119110107422
2024-03-04 23:35:41.289 kserve.trace kserve.io.kserve.protocol.rest.v2_endpoints.infer: 0.0008300000000001084
2024-03-04 23:35:41.311 uvicorn.access INFO:     100.80.96.9:0 10 - "POST /v2/models/mnist/infer HTTP/1.1" 503 Service Unavailable
2024-03-04 23:35:41.311 10 kserve ERROR [model_not_ready_handler():118] Exception:
Traceback (most recent call last):
  File "/home/venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/home/venv/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/home/venv/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/home/venv/lib/python3.9/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/home/venv/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/home/venv/lib/python3.9/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/home/venv/lib/python3.9/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/home/venv/lib/python3.9/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
    return await dependant.call(**values)
  File "/home/venv/lib/python3.9/site-packages/kserve/protocol/rest/v2_endpoints.py", line 139, in infer
    raise ModelNotReady(model_name)
kserve.errors.ModelNotReady: Model with name mnist is not ready.
2024-03-04 23:35:41.312 kserve.trace kserve.io.kserve.protocol.rest.v2_endpoints.infer: 0.0007150173187255859
2024-03-04 23:35:41.312 kserve.trace kserve.io.kserve.protocol.rest.v2_endpoints.infer: 0.0007139999999998814
2024-03-04 23:35:57.170 uvicorn.access INFO:     100.80.175.253:47894 10 - "GET /metrics HTTP/1.1" 200 OK
agunapal commented 8 months ago

@tylertitsworth

You need to check for the ready signal before sending requests. The pod can show running, the model loading can take some time.

https://github.com/pytorch/serve/blob/master/kubernetes/kserve/kserve_wrapper/TorchserveModel.py#L213-L214

the client can make a preemptive /v1/models/ call and check for ready:true state in the response as the truly ready state for inference requests

tylertitsworth commented 8 months ago

I've been using V2 API for this, are you expecting users to use v1 instead.

Also, how long should we expect to wait for the model to be ready? I waited a couple minutes before testing because I know from deploying without kserve these models take 1-2 seconds to register.

Tyler

agunapal commented 8 months ago

It can be V1 or V2. This is provided by KServe.

This was added recently to handle large model usecase where it can take a few minutes

Currently, they are checking the model state every 30 seconds. https://github.com/pytorch/serve/blob/master/kubernetes/kserve/kserve_wrapper/TorchserveModel.py#L211

agunapal commented 8 months ago

You can configure this using env variable as shown in the code

tylertitsworth commented 8 months ago

I am able to reproduce the error with the v1 call /v1/models/bert and /v1/models/mnist. It seems that the model never becomes ready, only loaded.

tylertitsworth commented 8 months ago

I've tried the same process with a custom model based on resnet50, and I am having the same issue. I reproduced in a non-kserve torchserve environment and found no perceivable difference in the logs being shown in regards to how the model is registered. The last output for model registration is always <modelname> State change WORKER_STARTED -> WORKER_MODEL_LOADED, afterwards the model is ready to be used.

agunapal commented 8 months ago

Hi @tylertitsworth You should something like this.

You can see here that the ready state was set around 1 minute after model loaded.

2024-03-05T21:50:52,676 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1709675452676
2024-03-05T21:50:52,751 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - model_name: mnist, batchSize: 2
2024-03-05T21:50:53,917 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - generated new fontManager
2024-03-05T21:50:54,150 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2024-03-05T21:50:54,150 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2024-03-05T21:50:54,190 [INFO ] W-9000-mnist_1.0-stdout MODEL_LOG - '/home/model-server/tmp/models/7da277edaeac4c7cb9ffeaaac686b1fd/index_to_name.json' is missing. Inference output will not include class name.
2024-03-05T21:50:54,191 [INFO ] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1515
2024-03-05T21:50:54,192 [DEBUG] W-9000-mnist_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-mnist_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-03-05T21:50:54,192 [INFO ] W-9000-mnist_1.0 TS_METRICS - WorkerLoadTime.Milliseconds:3957.0|#WorkerName:W-9000-mnist_1.0,Level:Host|#hostname:torchserve-mnist-v2-predictor-00001-deployment-6dbdbc98dd-pq54n,timestamp:1709675454
2024-03-05T21:50:54,192 [INFO ] W-9000-mnist_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:2.0|#Level:Host|#hostname:torchserve-mnist-v2-predictor-00001-deployment-6dbdbc98dd-pq54n,timestamp:1709675454
INFO:root:Loading mnist .. 2 of 10 tries..
2024-03-05T21:51:22,060 [INFO ] epollEventLoopGroup-3-2 ACCESS_LOG - /127.0.0.1:41044 "GET /models/mnist?customized=false HTTP/1.1" 200 1
2024-03-05T21:51:22,061 [INFO ] epollEventLoopGroup-3-2 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:torchserve-mnist-v2-predictor-00001-deployment-6dbdbc98dd-pq54n,timestamp:1709675482
INFO:root:Sleep 30 seconds for load mnist..
2024-03-05T21:51:51,188 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:torchserve-mnist-v2-predictor-00001-deployment-6dbdbc98dd-pq54n,timestamp:1709675511
2024-03-05T21:51:51,188 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:206.9672508239746|#Level:Host|#hostname:torchserve-mnist-v2-predictor-00001-deployment-6dbdbc98dd-pq54n,timestamp:1709675511
2024-03-05T21:51:51,189 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:83.58369445800781|#Level:Host|#hostname:torchserve-mnist-v2-predictor-00001-deployment-6dbdbc98dd-pq54n,timestamp:1709675511
2024-03-05T21:51:51,189 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:28.8|#Level:Host|#hostname:torchserve-mnist-v2-predictor-00001-deployment-6dbdbc98dd-pq54n,timestamp:1709675511
2024-03-05T21:51:51,189 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:27986.3203125|#Level:Host|#hostname:torchserve-mnist-v2-predictor-00001-deployment-6dbdbc98dd-pq54n,timestamp:1709675511
2024-03-05T21:51:51,190 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:3272.30078125|#Level:Host|#hostname:torchserve-mnist-v2-predictor-00001-deployment-6dbdbc98dd-pq54n,timestamp:1709675511
2024-03-05T21:51:51,190 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:11.8|#Level:Host|#hostname:torchserve-mnist-v2-predictor-00001-deployment-6dbdbc98dd-pq54n,timestamp:1709675511
INFO:root:The model mnist is ready
INFO:root:TSModelRepo is initialized
INFO:kserve:Registering model: mnist
INFO:kserve:Setting max asyncio worker threads as 5
INFO:kserve:Starting uvicorn with 1 workers
2024-03-05 21:51:52.142 uvicorn.error INFO:     Started server process [10]
2024-03-05 21:51:52.142 uvicorn.error INFO:     Waiting for application startup.
2024-03-05 21:51:52.144 10 kserve INFO [start():62] Starting gRPC server on [::]:8081
2024-03-05 21:51:52.144 uvicorn.error INFO:     Application startup complete.
2024-03-05 21:51:52.145 uvicorn.error INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
tylertitsworth commented 8 months ago

I exec'd into the container with my custom model and made a simple request on the non-kserve API:

$ wget -O - http://localhost:8085/models/hazelnut
--2024-03-05 22:47:35--  http://localhost:8085/models/hazelnut
Resolving localhost (localhost)... ::1, 127.0.0.1
Connecting to localhost (localhost)|::1|:8085... connected.
HTTP request sent, awaiting response... 200 OK
Length: 587 [application/json]
Saving to: β€˜STDOUT’

-                                                          0%[                                                                                                                                 ]       0  --.-KB/s
  {
    "modelName": "hazelnut",
    "modelVersion": "1.0",
    "modelUrl": "resnet50_layer3_hazelnut.mar",
    "runtime": "python",
    "minWorkers": 1,
    "maxWorkers": 2,
    "batchSize": 1,
    "maxBatchDelay": 500,
    "loadedAtStartup": false,
    "workers": [
      {
        "id": "9000",
        "startTime": "2024-03-05T17:46:23.301Z",
        "status": "READY",
        "memoryUsage": 266993664,
        "pid": 269,
        "gpu": false,
        "gpuUsage": "N/A"
      }
    ],
    "jobQueueStatus": {
      "remainingCapacity": 10,
      "pendingRequests": 0
    }
  }

Based on https://github.com/pytorch/serve/blob/master/kubernetes/kserve/kserve_wrapper/TorchserveModel.py#L183-L185 my model has a worker with the READY status, however I don't have the output you posted above and it still reports 'model not ready' regardless as to which model I register. If I unregister the model and re-register it the wrapper doesn't seem to do anything.

Is there a known good version of this wrapper besides nightly?

tylertitsworth commented 8 months ago

I just noticed the following when the wrapper starts up:

INFO:root:Loading mnist .. 1 of 10 tries..
INFO:root:Expecting value: line 1 column 1 (char 0)
INFO:root:Failed loading model mnist
INFO:root:Predict URL set to 0.0.0.0:8085
INFO:root:Explain URL set to 0.0.0.0:8085
INFO:root:Protocol version is v2
INFO:root:Copying contents of /mnt/models/model-store to local
INFO:root:Loading bert .. 1 of 10 tries..
INFO:root:Expecting value: line 1 column 1 (char 0)
INFO:root:Failed loading model bert
INFO:root:TSModelRepo is initialized
INFO:kserve:Registering model: mnist
INFO:kserve:Registering model: bert

That would explain why it's not ready, something about the model config is wrong.

tylertitsworth commented 8 months ago

I tried re-training the mnist model and it looks like I can use it with torchserve just fine, but the moment I switch to KServe it raises an exception in that try/except block. It looks like a JSON Decode error, which indicates that the GET request is makes fails.

I stopped the kserve wrapper in the pod, and then restarted it once the models were ready and it still throws this error. Indicating that this section here fails:

response = requests.get(
    READINESS_URL_FORMAT.format(
        self.management_address, self.name, model_load_customized
    ),
    timeout=model_load_timeout,
).json()

I'm not seeing this request in my logfile, so it seems that it's failing to make a request to the to the management API.

When I try to replicate the request with the following: wget -O - http://0.0.0.0:8085/models/mnist?customized=false, the request get processed through my company's proxy server and returns ERROR 503: Service Unavailable.

tylertitsworth commented 8 months ago

After adding the following to my no_proxy:

$ export no_proxy+=',0.0.0.0'

The request is successful, I will try to modify my proxy configmap and followup tomorrow on this issue. I think the biggest issue here is that the failed request wasn't well reported in the logs, and could be better captured in python.

tylertitsworth commented 8 months ago

After adding the proxy change and recreating the endpoint, the models loaded successfully.

 2024-03-06T16:40:00,520 [DEBUG] W-9001-bert_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-bert_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED 
 INFO:root:Loading mnist .. 2 of 10 tries.. 
 2024-03-06T16:40:01,146 [INFO ] epollEventLoopGroup-3-1 ACCESS_LOG - /127.0.0.1:40670 "GET /models/mnist?customized=false HTTP/1.1" 200 5 
 INFO:root:Sleep 30 seconds for load mnist.. 
 INFO:root:The model mnist is ready 
 INFO:root:Predict URL set to 0.0.0.0:8085 
 INFO:root:Explain URL set to 0.0.0.0:8085 
 INFO:root:Protocol version is v2 
 INFO:root:Copying contents of /mnt/models/model-store to local 
 INFO:root:Loading bert .. 1 of 10 tries.. 
 2024-03-06T16:40:31,191 [INFO ] epollEventLoopGroup-3-2 ACCESS_LOG - /127.0.0.1:44574 "GET /models/bert?customized=false HTTP/1.1" 200 1 
 INFO:root:Sleep 30 seconds for load bert.. 
 INFO:root:The model bert is ready 
 INFO:root:TSModelRepo is initialized 
 INFO:kserve:Registering model: mnist 
 INFO:kserve:Registering model: bert 
 INFO:kserve:Setting max asyncio worker threads as 32 
 INFO:kserve:Starting uvicorn with 1 workers 
 2024-03-06 16:41:01.363 uvicorn.error INFO:     Started server process [10] 
 2024-03-06 16:41:01.363 uvicorn.error INFO:     Waiting for application startup. 
 2024-03-06 16:41:01.365 10 kserve INFO [start():62] Starting gRPC server on [::]:8081 
 2024-03-06 16:41:01.366 uvicorn.error INFO:     Application startup complete. 
 2024-03-06 16:41:01.366 uvicorn.error INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit) 

And I was able to verify this with the v1 path: {"name":"mnist","ready":"True"}.

agunapal commented 8 months ago

Hi @tylertitsworth Thanks for debugging this. Was there anything in config.properties, that caused this ?

Wondering if this PR would address this https://github.com/pytorch/serve/pull/2995

tylertitsworth commented 8 months ago

It was purely proxies and a lack of error handling for 503 responses due to proxies.

If there could be some docs about how to deploy under a corporate proxy, and explicitly mentioning that 0.0.0.0 needs to be under no_proxy for kserve to function since it needs to communicate back to torchserve would help.

adogwangwang commented 8 months ago

πŸ› Describe the bug

When creating a new KServe (0.11.2) Endpoint utilizing the TorchServe predictor in this example. It registers the model(s) without issue, however when I attempt an inference request I receive the following: {"error":"Model with name bert is not ready."}. I have tried this process with both mnist and bert examples, and I first had to modify the provided json input to be valid:

{
    "id": "d3b15cad-50a2-4eaf-80ce-8b0a428bd298",
    "inputs": [
        {
            "name": "4b7c7d4a-51e4-43c8-af61-04639f6ef4bc",
            "shape": [-1],
            "datatype": "BYTES",
            "data": ["{\"text\":\"Bloomberg has decided to publish a new report on the global economy.\", \"target\":1}"]
        }
    ]
}

Error logs

2024-03-01 17:32:23.261 10 kserve ERROR [model_not_ready_handler():105] Exception: 
 Traceback (most recent call last): 
   File "/home/venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__ 
     await self.app(scope, receive, sender) 
   File "/home/venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__ 
     raise e 
   File "/home/venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__ 
     await self.app(scope, receive, send) 
   File "/home/venv/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__ 
     await route.handle(scope, receive, send) 
   File "/home/venv/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle 
     await self.app(scope, receive, send) 
   File "/home/venv/lib/python3.10/site-packages/starlette/routing.py", line 66, in app 
     response = await func(request) 
   File "/home/venv/lib/python3.10/site-packages/fastapi/routing.py", line 237, in app 
     raw_response = await run_endpoint_function( 
   File "/home/venv/lib/python3.10/site-packages/fastapi/routing.py", line 163, in run_endpoint_function 
     return await dependant.call(**values) 
   File "/home/venv/lib/python3.10/site-packages/kserve/protocol/rest/v2_endpoints.py", line 138, in infer 
     raise ModelNotReady(model_name) 
 kserve.errors.ModelNotReady: Model with name bert is not ready.

Installation instructions

Install torchserve from source: No Using Docker: Yes, k8s

Model Packaing

using the provided example instructions and then using kubectl cp to a PVC in my cluster with the correct permissions

config.properties

inference_address=http://0.0.0.0:8085
management_address=http://0.0.0.0:8085
metrics_address=http://0.0.0.0:8082
grpc_inference_port=7070
grpc_management_port=7071
enable_envvars_config=true
install_py_dep_per_model=true
enable_metrics_api=true
metrics_mode=prometheus
NUM_WORKERS=1
number_of_netty_threads=4
job_queue_size=10
model_store=/mnt/models/model-store
model_snapshot={"name":"startup.cfg","modelCount":2,"models":{"mnist":{"1.0":{"defaultVersion":true,"marName":"mnist.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}},"bert":{"1.0":{"defaultVersion":true,"marName":"bert.mar","minWorkers":1,"maxWorkers":5,"batchSize":2,"maxBatchDelay":500,"responseTimeout":60}}}}
allowed_urls=https://s3.amazonaws.com/.*,https://torchserve.pytorch.org/.*
ipex_enable=true
cpu_launcher_enable=true
cpu_launcher_args=--use_logical_core

Versions

from intel/intel-extension-for-pytorch:2.2.0-serving-cpu-kserve:

------------------------------------------------------------------------------------------
Environment headers
------------------------------------------------------------------------------------------
Torchserve branch: 

torchserve==0.9.0
torch-model-archiver==0.9.0

Python version: 3.10 (64-bit runtime)
Python executable: /home/venv/bin/python

Versions of relevant python libraries:
captum==0.7.0
intel-extension-for-pytorch==2.2.0+cpu
numpy==1.26.4
pillow==10.2.0
psutil==5.9.8
requests==2.31.0
requests-oauthlib==1.3.1
torch==2.2.0+cpu
torch-model-archiver==0.9.0
torch-workflow-archiver==0.2.11
torchaudio==2.2.0+cpu
torchdata==0.7.1
torchserve==0.9.0
torchtext==0.17.0+cpu
torchvision==0.17.0+cpu
transformers==4.38.1
wheel==0.42.0
torch==2.2.0+cpu
torchtext==0.17.0+cpu
torchvision==0.17.0+cpu
torchaudio==2.2.0+cpu

Java Version:

OS: N/A
GCC version: N/A
Clang version: N/A
CMake version: N/A

Repro instructions

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: "my-endpoint"
spec:
  predictor:
    model:
      modelFormat:
        name: pytorch
      protocolVersion: v2
      storageUri: pvc://my-pvc
$ kubectl cp config.properties <pvcviewer-pod>:config/config.properties
$ kubectl cp model-store <pvcviewer-pod>:model-store
$ kubectl apply -f endpoint.yaml
$ curl -s -X POST --location "http://$INGRESS_HOST:$INGRESS_PORT/v2/models/bert/infer" \
--header "Host: $SERVICE_HOSTNAME" \
--header 'Content-Type: application/json' \
--header "Cookie: authservice_session=$AUTHSERVICE_TOKEN" -d @./bytes.json

Possible Solution

The issue seems to be in the kserve wrapper, because this issue does not occur with the standard torchserve container, and the kserve container uses that torchserve container as a base.

I am also seeing the same issue in #2968 with the bert model, however the result is the same for the mnist model.

If the worker died and was not reported by the logger, then I'm not clear on how to get the status of the predictor's workers via the predict v2 api.

hello bro ,I meet the same with you , I am using kserve0.11.0 to deploy gemma, three pods runs successfully ,but when I request the model, I meet kserve.errors.ModelNotReady: Model with name gemma-7b-it is not ready. I have see your relpy, just add 0.0.0.0 to no_proxy so solved? I do the same step as you ,but the results are the same( ,hope your help~~~