pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.18k stars 852 forks source link

java.lang.NullPointerException for a custom model being delpoyed #2369

Closed Jyothipyxis closed 1 year ago

Jyothipyxis commented 1 year ago

We have a sentence-transformer model being packed into MAR file and saved in our storage. We are using Kserve on Kubeflow to deploy the models. we have followed these steps https://kserve.github.io/website/0.8/modelserving/v1beta1/torchserve/ and also referred to https://github.com/AshutoshDongare/HuggingFace-Model-Serving to generate MAR for other models. After deploying everything using below yaml, we are getting this error.

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "all-minilm-l6-v2"
  labels:
    istio: kservegateway
  annotations:
    sidecar.istio.io/inject: "false"
spec:
  predictor:
    pytorch:
      storageUri: "s3://s3-kubeflow-develop/sentenceTransformer"

error log

Defaulted container "kserve-container" out of: kserve-container, queue-proxy, storage-initializer (init)
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
/usr/local/lib/python3.8/dist-packages/ray/autoscaler/_private/cli_logger.py:57: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
  warnings.warn(
2023-05-29T14:25:46,893 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2023-05-29T14:25:47,189 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.5.2
TS Home: /usr/local/lib/python3.8/dist-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 1
Max heap size: 494 M
Python executable: /usr/bin/python
Config file: /mnt/models/config/config.properties
Inference address: http://0.0.0.0:8085
Management address: http://0.0.0.0:8081
Metrics address: http://0.0.0.0:8082
Model Store: /mnt/models/model-store
Initial Models: N/A
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 4
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: true
Metrics report format: prometheus
Enable metrics API: true
Workflow Store: /mnt/models/model-store
Model config: N/A
2023-05-29T14:25:47,202 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2023-05-29T14:25:47,318 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Started restoring models from snapshot {"name":"startup.cfg","modelCount":1,"models":{"sentenceTransformerMiniLM":{"1.0":{"defaultVersion":true,"marName":"sentenceTransformerMiniLM.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"responseTimeout":120}}}}
2023-05-29T14:25:47,370 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Validating snapshot startup.cfg
2023-05-29T14:25:47,371 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Snapshot startup.cfg validated successfully
[I 230529 14:25:48 __main__:75] Wrapper : Model names ['sentenceTransformerMiniLM'], inference address http//0.0.0.0:8085, management address http://0.0.0.0:8081, model store /mnt/models/model-store
[I 230529 14:25:48 TorchserveModel:54] kfmodel Predict URL set to 0.0.0.0:8085
[I 230529 14:25:48 TorchserveModel:56] kfmodel Explain URL set to 0.0.0.0:8085
[I 230529 14:25:48 TSModelRepository:30] TSModelRepo is initialized
[I 230529 14:25:48 kfserver:150] Registering model: sentenceTransformerMiniLM
[I 230529 14:25:48 kfserver:120] Setting asyncio max_workers as 5
[I 230529 14:25:48 kfserver:127] Listening on port 8080
[I 230529 14:25:48 kfserver:129] Will fork 1 workers
2023-05-29T14:25:51,869 [INFO ] main org.pytorch.serve.ModelServer - Torchserve stopped.
java.lang.NullPointerException
    at org.pytorch.serve.wlm.Model.setModelState(Model.java:82)
    at org.pytorch.serve.wlm.ModelManager.createModel(ModelManager.java:289)
    at org.pytorch.serve.wlm.ModelManager.registerAndUpdateModel(ModelManager.java:91)
    at org.pytorch.serve.snapshot.SnapshotManager.initModels(SnapshotManager.java:136)
    at org.pytorch.serve.snapshot.SnapshotManager.restore(SnapshotManager.java:119)
    at org.pytorch.serve.ModelServer.initModelStore(ModelServer.java:152)
    at org.pytorch.serve.ModelServer.startRESTserver(ModelServer.java:356)
    at org.pytorch.serve.ModelServer.startAndWait(ModelServer.java:117)
    at org.pytorch.serve.ModelServer.main(ModelServer.java:98)

what possiblily could have gone wrong here and might be initialized ? Thanks in advance !.

Jyothipyxis commented 1 year ago

This issue is resolved, I had add a property called MAX_BATCH_DELAY in the config properties file.

luntropy commented 6 months ago

More specifically, at least in my case, I had to add it to the model_snapshot part in the config.properties:

inference_address=http://0.0.0.0:8085 management_address=http://0.0.0.0:8085 metrics_address=http://0.0.0.0:8082 grpc_inference_port=7070 grpc_management_port=7071 enable_metrics_api=true metrics_format=prometheus number_of_netty_threads=4 job_queue_size=10 enable_envvars_config=false install_py_dep_per_model=true model_store=/mnt/models/model-store model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"my-model-name":{"1.0":{"defaultVersion":true,"marName":"my_model.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":10,"responseTimeout":120}}}}