pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.19k stars 858 forks source link

How to decode the gRPC PredictionResponse string efficiently #1684

Open IamMohitM opened 2 years ago

IamMohitM commented 2 years ago

📚 The doc issue

There is no documentation about decoding the received bytes form PredictionResponse into torch tensor efficiently. Currently, the only working solution is using ast.literal_eval, which is extremely slow.

response = inference_stub.Predictions(
            inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
predictions = torch.astensor(literal_eval(response.prediction.decode('utf-8')))

Using methods like numpy.fromstring, numpy.frombuffer or torch.frombuffer returns the following error:

> np.fromstring(response.prediction.decode("utf-8"))
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: string size must be a multiple of element size

The following returns an incorrect tensor values. The number of elements are not the same as expected number of elements.

torch.frombuffer(response.prediction, dtype = torch.float32)

Suggest a potential alternative/fix

No response

IamMohitM commented 2 years ago

I have added my solution to Stack Overflow which has solved the bottleneck: https://stackoverflow.com/a/73721450/8727339

However, the solution uses TensorFlow. I'd still love to hear from the TorchServe devs if something similar that can be achieved with Torch itself.