Closed Jyothipyxis closed 1 year ago
This issue is resolved, I had add a property called MAX_BATCH_DELAY
in the config properties file.
More specifically, at least in my case, I had to add it to the model_snapshot part in the config.properties:
inference_address=http://0.0.0.0:8085 management_address=http://0.0.0.0:8085 metrics_address=http://0.0.0.0:8082 grpc_inference_port=7070 grpc_management_port=7071 enable_metrics_api=true metrics_format=prometheus number_of_netty_threads=4 job_queue_size=10 enable_envvars_config=false install_py_dep_per_model=true model_store=/mnt/models/model-store model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"my-model-name":{"1.0":{"defaultVersion":true,"marName":"my_model.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":10,"responseTimeout":120}}}}
We have a sentence-transformer model being packed into MAR file and saved in our storage. We are using Kserve on Kubeflow to deploy the models. we have followed these steps https://kserve.github.io/website/0.8/modelserving/v1beta1/torchserve/ and also referred to https://github.com/AshutoshDongare/HuggingFace-Model-Serving to generate MAR for other models. After deploying everything using below yaml, we are getting this error.
error log
what possiblily could have gone wrong here and might be initialized ? Thanks in advance !.