tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.52k stars 3.5k forks source link

Serving using Cloud ML Engine #900

Open ndvbd opened 6 years ago

ndvbd commented 6 years ago

Serving locally using tensorflow_model_server works fine. I've put an exported model/version on Google Cloud ML Engine. The question is how do I set the query.py to use a remote server instead of local one? This is the function in query.py that define the local port and host (which can be remote):

def create_stub():
  host, port = FLAGS.server.split(":")
  channel = implementations.insecure_channel(host, int(port))
  return prediction_service_pb2.beta_create_PredictionService_stub(channel)

I believe it uses GRPC. Can the Cloud ML use GRPC? If not, and we must use JSON, where in the t2t code I can set to send data in JSON format to the Cloud ML and parse the response from JSON?

ericp96 commented 6 years ago

The tensorflow serving application uses grpc, however t2t can use the google sdk to communicate with ml engine (at least the code is there - I can't vouch it works yet):

https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/serving/serving_utils.py#L88

I am grateful that they are trying to make my life easier but I understand your confusion.

Looking at the code it appears you need to set the --cloud_mlengine_model_name= flag when querying for data.

https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/serving/query.py#L43

ndvbd commented 6 years ago

Well, it works, but I know getting: "Prediction server is out of memory, possibly because model size is too big." Altough my SavedModel size is 237MB. Is there a way to take an already-trained model and either quantise the weights or change the int64 to int32?

ericp96 commented 6 years ago

You can use the t2t-exporter executable to export your snapshot in preparation for serving. It decreased my model size by about 90%.

ndvbd commented 6 years ago

I already used tensor2tensor.serving.export. It took the 684mb and shrinked it to 237MB (66% decrease). But I need more. Is there a way to configure it to do quantization or converting int64 to int32?

SunshineBot commented 4 years ago

@ndvbd hi, have you got the way to do quantization for tensor2tensor models?

ndvbd commented 4 years ago

Haven't had the time to deal with it unfortunately.