How to decode the gRPC PredictionResponse string efficiently

📚 The doc issue

There is no documentation about decoding the received bytes form PredictionResponse into torch tensor efficiently. Currently, the only working solution is using ast.literal_eval, which is extremely slow.

response = inference_stub.Predictions(
            inference_pb2.PredictionsRequest(model_name=model_name, input=input_data))
predictions = torch.astensor(literal_eval(response.prediction.decode('utf-8')))

Using methods like numpy.fromstring, numpy.frombuffer or torch.frombuffer returns the following error:

> np.fromstring(response.prediction.decode("utf-8"))
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: string size must be a multiple of element size

The following returns an incorrect tensor values. The number of elements are not the same as expected number of elements.

torch.frombuffer(response.prediction, dtype = torch.float32)

Suggest a potential alternative/fix

No response

pytorch / serve

How to decode the gRPC PredictionResponse string efficiently #1684

📚 The doc issue

Suggest a potential alternative/fix