openvinotoolkit / model_server

A scalable inference server for models optimized with OpenVINO™
https://docs.openvino.ai/2024/ovms_what_is_openvino_model_server.html
Apache License 2.0
671 stars 211 forks source link

Whisper model deployment with OVMS #2066

Open Aditya-Scalers opened 1 year ago

Aditya-Scalers commented 1 year ago

With the reference of whisper implementation with openvino for subtitle generation, I was able to create the whisper_encoder and whisper_decoder xml and bin files. Using whisper_encoder and whisper_decoder as seperate models with ovms i was able to start the docker container.

New status: ( "state": "AVAILABLE", "error_code": "OK" ) [2023-09-26 13:14:16.673][1][serving][info][model.cpp:88] Updating default version for model: whisper, from: 0 [2023-09-26 13:14:16.673][1][serving][info][model.cpp:98] Updated default version for model: whisper, to: 1 [2023-09-26 13:14:16.673][66][modelmanager][info][modelmanager.cpp:1069] Started model manager thread [2023-09-26 13:14:16.673][1][serving][info][servablemanagermodule.cpp:45] ServableManagerModule started [2023-09-26 13:14:16.673][67][modelmanager][info][modelmanager.cpp:1088] Started cleaner thread

I am not able to perform inference on these models. Any help would be appreciated.

client code:

from ovmsclient import make_grpc_client

client = make_grpc_client("localhost:9000")
binary_data = None
with open("output.bin", "rb") as binary_file:
    binary_data = binary_file.read()
data_dict = {
    "binary_data": binary_data
}
results = client.predict(inputs=data_dict, model_name="whisper")

When i request from the client using binary input of audio file i am getting this error.

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ovmsclient/tfs_compat/grpc/serving_client.py", line 47, in predict
    raw_response = self.prediction_service_stub.Predict(request.raw_request, timeout)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/grpc/_channel.py", line 1161, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/grpc/_channel.py", line 1004, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.INVALID_ARGUMENT
        details = "Invalid number of inputs - Expected: 26; Actual: 1"
        debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:9000 {grpc_message:"Invalid number of inputs - Expected: 26; Actual: 1", grpc_status:3, created_time:"2023-09-27T07:31:16.658787286+00:00"}"
>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/cloudvedge/client.py", line 12, in <module>
    results = client.predict(inputs=data_dict, model_name="whisper")
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ovmsclient/tfs_compat/grpc/serving_client.py", line 49, in predict
    raise_from_grpc(grpc_error)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ovmsclient/tfs_compat/base/errors.py", line 66, in raise_from_grpc
    raise (error_class(details))
ovmsclient.tfs_compat.base.errors.InvalidInputError: Error occurred during handling the request: Invalid number of inputs - Expected: 26; Actual: 1
atobiszei commented 1 year ago

Hi, you are getting error that OVMS expects more inputs than you provide. I assume that you use some kind of wrapper for OV model that encapsulates the fact that OV uses much more inputs than one to perform inference.

We have plans to support adding python code execution support inside OVMS so that could ease the integration in cases when you have existing python wrapping.

One thing I noticed as well is that you tried to use binary audi file - right now OVMS only supports images with binary inputs.

nhha1602 commented 11 months ago

Hi,

Reference to this link. I also exported whisper encode & decode model to IR format. Then I tested it successfully with some class in this link

Next steps, How can I use these models in IR format for OPVM loading it and then I can use client to inference it ?

Please advise this.