Closed gr8Adakron closed 6 years ago
--enable_batching turns on server-side batching, which means that the server will group together multiple inference requests it receives from clients and process them in a batch. It should be completely transparent to the client (other than performance). If by "script" you mean something that's sending requests as a client, then the same script should work.
(If you want to do client-side batching, that is also possible -- just have your client batch together multiple requests into a single RPC to the server.)
Thank you! for the reply. But if I use the same script after starting the server with --enabled_batching flag, it returns me this error.
Traceback (most recent call last):
File "inception_client.py", line 59, in <module>
tf.app.run()
File "/home/afzal/.virtualenvs/tensorflow_python36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "inception_client.py", line 54, in main
result = stub.Predict(request, 10.0) # 10 secs timeout
File "/home/afzal/.virtualenvs/tensorflow_python36/lib/python3.6/site-packages/grpc/beta/_client_adaptations.py", line 309, in __call__
self._request_serializer, self._response_deserializer)
File "/home/afzal/.virtualenvs/tensorflow_python36/lib/python3.6/site-packages/grpc/beta/_client_adaptations.py", line 195, in _blocking_unary_unary
raise _abortion_error(rpc_error_call)
grpc.framework.interfaces.face.face.AbortionError: AbortionError(code=StatusCode.FAILED_PRECONDITION, details="Batched output tensor's 0th dimension does not equal the sum of the 0th dimension sizes of the input tensors")
Ah, it looks like your tensorflow graph is not structured in a way to be compatible with the tf-serving batcher. The error you are getting is that the batcher is complaining that it cannot match up the outputs to the batched inputs it sent into the tf::Session::Run() call. The batcher assumes that the 0th dimension is the batch dimension, and it concatenates input tensors along that dimension (e.g. if three input tensors have batch sizes 2, 5 and 7 and the batcher decides to group them into a batch, it will concatenate them into a single tensor of batch size 2+5+7=14). Then it expects the result of running the graph on the batched tensor to have the same batch dim size (e.g. the output should have batch size 14), after which it will split the output into individual (unbatched) tensors attributed to the original unbatched inputs (e.g. 2, 5 and 7). OTOH if it runs the graph with an input with batch dim size 14 and gets back an output with batch dim size 27, it has no idea how to split the 27 up to correspond to the inputs.
Is it means that I can use batching simply by set enable_batching=true , without any changes to other codes?
--enable_batching=true will enable batching, yes.
If all you do is set that flag it will use ~sensible default values of tuning parameters, including the max batch size (which affects throughput) and the batch timeout (which affects tail latency). See https://github.com/tensorflow/serving/blob/master/tensorflow_serving/batching/README.md for ideas on how to set those parameters if you want to do more tuning. You use --batching_parameters_file to pass them to the model server binary.
@gr8Adakron Just wanted to follow up. Were you able to resolve this issue?
This is a hybrid script, but it will answer your question. Just a little effort.
from grpc.beta import implementations
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2
from keras.preprocessing import image as keras_image
from keras.applications.xception import preprocess_input as xpreprocess_input
import cv2
import re
import os
import time
import numpy as np
import tensorflow as tf
import numpy as np
import pandas as pd
import mxnet as mx
tf.app.flags.DEFINE_string('host', 'localhost', 'Service host')
tf.app.flags.DEFINE_string('port', '8006', 'Service port')
tf.app.flags.DEFINE_string('picture', '', 'path to picture')
FLAGS = tf.app.flags.FLAGS
output_list = []
def predictResponse_into_nparray(response, output_tensor_name):
dims = response.outputs[output_tensor_name].tensor_shape.dim
shape = tuple(d.size for d in dims)
return np.reshape(response.outputs[output_tensor_name].float_val, shape)
def pretty(d, indent=0):
for key, value in d.items():
print('\t' * indent + str(key))
if isinstance(value, dict):
pretty(value, indent+1)
else:
print('\t' * (indent+1) + str(value))
def return_prediction(file_name,offset_output,labels_file):
with open(labels_file, 'r') as f:
labels = [l.rstrip() for l in f]
#..> NP-squeeze-argmax
sq_output = np.squeeze(offset_output)
argmax_index = np.argmax(np.squeeze(offset_output), axis=0)
probindex = np.argsort(sq_output)[::-1]
#..> Stackoverflow
# scores = offset_output[0]
# rank = np.array(offset_output[0]).argsort()[-5:][::-1]
prediction = labels[argmax_index]
probability = format(sq_output[argmax_index],'.3f')
#print(f"Filename {file_name}, Prediction {prediction}, probability {probability}.")
return prediction,probability
def resize_image(absolute_image_path):
base_path = absolute_image_path[:-4]
resized_path = f"{base_path}_resized.jpg"
original_image = cv2.imread(absolute_image_path)
original_rgb = cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)
dim = (200,200)
resized = cv2.resize(original_rgb, dim, interpolation = cv2.INTER_AREA)
resized_color = cv2.cvtColor(resized, cv2.COLOR_BGR2RGB)
cv2.imwrite(resized_path,resized_color)
return resized_path
def preprocessing_mxnet(filename):
img = cv2.imread(filename)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (224,224,))
img = np.swapaxes(img, 0, 2)
img = np.swapaxes(img, 1, 2)
img = img.astype('float')
img[0,:,:] -= 123.68
img[1,:,:] -= 116.779
img[2,:,:] -= 103.939
#img = img[np.newaxis, :]
swap_img_1 = np.swapaxes(img, 0, 1)
#print(swap_img_1.shape)
swap_img_2 = np.swapaxes(swap_img_1, 1, 2)
#print(swap_img_2.shape)
swap_img_3 = np.swapaxes(swap_img_2, 0, 1)
return swap_img_2
def decode_image(jpeg_file):
with tf.device('/gpu:1'):
decoder_graph = tf.Graph()
with decoder_graph.as_default():
decoded_image = tf.image.decode_png(jpeg_file, channels=3)
decoded_image = tf.image.central_crop(decoded_image, 1)
decoded_image = tf.image.resize_images(decoded_image, (224,224))
normalized_image = tf.divide(decoded_image, 255)
# reshaped_image = tf.reshape(normalized_image, [-1, 331, 331, 3])
with tf.Session(graph = decoder_graph) as image_session:
# image_session = tf.Session(graph = decoder_graph)
input_0 = image_session.run(normalized_image)
return input_0
def preprocess_image_tensorflow(image_path):
with tf.gfile.FastGFile(image_path, 'rb') as jpeg_file_raw:
jpeg_file = jpeg_file_raw.read()
input_0 = decode_image(jpeg_file)
return input_0
def serving_model_prediction(stub,batch_of_img,model_name,input_tensor,output_tensor,signature):
request = predict_pb2.PredictRequest()
request.model_spec.name = model_name
request.model_spec.signature_name = signature #tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
request.inputs[input_tensor].CopyFrom(
tf.contrib.util.make_tensor_proto(batch_of_img, dtype='float32'))
result = stub.Predict(request, 30)
#print(result)
output = predictResponse_into_nparray(result,output_tensor)
return output
def batch_preprocessing_mxnet(image_list,image_dir):
preprocessed_input = []
all_img_names = []
for single_image in image_list:
image_path = f"{image_dir}{single_image}"
try:
all_img_names.append(image_path)
preprocessed_input.append(preprocessing_mxnet(image_path))
except:
continue
return preprocessed_input,all_img_names
def batch_prediction_mxnet(stub,
labels_file,
batch_input,
test_image_dir):
batch_output = []
batch_preprocessed,batch_img_names = batch_preprocessing_mxnet(batch_input,test_image_dir)
if(len(batch_preprocessed)==0): return []
batch_np = np.array(batch_preprocessed)
batch_return = serving_model_prediction(stub=stub,
batch_of_img = batch_np,
model_name = "mxnet_converted_pt_152",
input_tensor = "input",
output_tensor= "output",
signature = tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY)
for index,single_prediction in enumerate(batch_return):
prediction_array = batch_return[index]
image_name = batch_img_names[index]
batch_output.append({image_name:return_prediction(image_name,prediction_array,labels_file)})
return batch_output
def batch_preprocessing_tensorflow(image_list,image_dir):
preprocessed_input = []
all_img_names = []
for single_image in image_list:
image_path = f"{image_dir}{single_image}"
try:
all_img_names.append(image_path)
preprocessed_input.append(preprocess_image_tensorflow(image_path))
except:
continue
return preprocessed_input,all_img_names
def batch_prediction_tensorflow(stub,
labels_file,
batch_input,
test_image_dir):
batch_output = []
batch_preprocessed,batch_img_names = batch_preprocessing_tensorflow(batch_input,test_image_dir)
if(len(batch_preprocessed)==0): return []
batch_np = np.array(batch_preprocessed)
batch_return = serving_model_prediction(stub=stub,
batch_of_img = batch_np,
model_name = "product_type_tf_pnasnet",
input_tensor = "in",
output_tensor= "out",
signature = tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY)
for index,single_prediction in enumerate(batch_return):
prediction_array = batch_return[index]
image_name = batch_img_names[index]
batch_output.append({image_name:return_prediction(image_name,prediction_array,labels_file)})
return batch_output
def keras_preprocessing_func(image_path):
with tf.device('/cpu:0'):
image = keras_image.load_img(image_path,target_size=(224,224))
image = keras_image.img_to_array(image) #.astype('float32')/255
# image = np.expand_dims(image, axis=0)
image = xpreprocess_input(image)
return image
def batch_preprocessing_keras(image_list,image_dir):
preprocessed_input = []
all_img_names = []
for single_image in image_list:
image_path = f"{image_dir}{single_image}"
try:
all_img_names.append(image_path)
preprocessed_input.append(keras_preprocessing_func(image_path))
except:
continue
return preprocessed_input,all_img_names
def batch_prediction_keras(stub,
labels_file,
batch_input,
test_image_dir):
batch_output = []
batch_preprocessed,batch_img_names = batch_preprocessing_keras(batch_input,test_image_dir)
if(len(batch_preprocessed)==0): return []
batch_np = np.array(batch_preprocessed)
batch_return = serving_model_prediction(stub=stub,
batch_of_img = batch_np,
model_name = "product_type_keras",
input_tensor = "input",
output_tensor= "output",
signature = "datax_pt")
for index,single_prediction in enumerate(batch_return):
prediction_array = batch_return[index]
image_name = batch_img_names[index]
batch_output.append({image_name:return_prediction(image_name,prediction_array,labels_file)})
return batch_output
def batch_father(stub,
test_image_dir,
labels_file,
batch_size):
test_image_dir = "/home/adakron/tf-serving-arch-beta/test_data/2000-images-test/"
labels_file = "/home/adakron/tf-serving-arch-beta/labels_files/pt-converted.txt"
image_list = os.listdir(test_image_dir)
total_image = len(image_list)
all_input = []
all_img_names = []
all_output = []
no_of_iteration = int(total_image/batch_size)+2
start_time = time.time()
for current_batch in range(1,no_of_iteration):
batch_begin = int(batch_size*(current_batch-1))
batch_end = int(batch_size*(current_batch))
if(batch_end>total_image): batch_end=total_image
if(batch_begin==batch_end): break;
batch_input = image_list[batch_begin:batch_end]
print(f"> Batch start: {batch_begin}, Batch ends: {batch_end}")
#..> batch_prediction_mxnet | batch_prediction_keras | batch_prediction_tensorflow
batch_output = batch_prediction_tensorflow(stub=stub,
labels_file = labels_file,
batch_input = batch_input,
test_image_dir = test_image_dir)
all_output = all_output+batch_output
return all_output
def main(_):
host = FLAGS.host
port = FLAGS.port
channel = implementations.insecure_channel(host, int(port))
stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)
start_time = time.time()
final_output = []
# Send request
test_image_dir = "/home/adakron/tf-serving-arch-beta/test_data/100-images-test/"
labels_file = "/home/adakron/tf-serving-arch-beta/labels_files/pt-converted.txt"
batch_size = 16
batch_output = batch_father(stub,
test_image_dir=test_image_dir,
labels_file =labels_file,
batch_size = 16)
print(batch_output)
timetaken = str(format(float((time.time() - start_time)),'.3f'))
print(f"\n{timetaken} secs > {total_image}")
if __name__ == '__main__':
tf.app.run()
I had Various kind of model Keras, MXnet, Tensorflow. So, I guess for you, its only tensorflow So, see line number: 255, I have made an individual function for all of them. Hope! this would help.
Sorry, for the late response. Any doubt, feel free.
--enable_batching turns on server-side batching, which means that the server will group together multiple inference requests it receives from clients and process them in a batch. It should be completely transparent to the client (other than performance). If by "script" you mean something that's sending requests as a client, then the same script should work.
(If you want to do client-side batching, that is also possible -- just have your client batch together multiple requests into a single RPC to the server.)
In Keras, we used to give batch_size upon prediction. How is that different from these batching settings in Tensorflow Serving?
I have a model trained in Keras as .hdf5
file.
In Python, loading .hdf5
file and prediction takes 300 ms.
INPUT:
[[ 0.0, 1.0, 2.0, 3.0, 4.0],
[ 6.0, 7.0, 8.0, 9.0, 1876.0]]
After converting the .hdf5
file to SavedModelFormat(.pb
), I tried to load the .pb
in Java and did prediciton. But, it took 600 ms (twice the time)
INPUT:
{
"inputs": {
"source_ids": [[ 0.0, 1.0, 2.0, 3.0, 4.0],
[ 6.0, 7.0, 8.0, 9.0, 1876.0]]
}
}
Usually, TFServing should be faster!
@chrisolston If I'm doing client-side batching, example my raw-data in curl request is something like:
instances = [{"b64_image":"_9j_4AA..."},{"b64_image":"_9j_4AA..."},...]
{"signature_name": "serving_default", "instances": instances}
, is it mandatory to set --enable_batching server-side or will batching be done automatically by the TF server?
Hi @aginpatrick and @sathyarr - wondering if you got answers to your questions, I'm curious about the same things for my use-case.
@Rytyka It's been years! I don't remember exactly! But ultimately tf-serving and Java were relatively faster I guess atleast without batching
Thanks for getting back to me :)
I have successfully installed the tensorflow-serving and I am able to get the prediction for pre-build inception model using tensorflow-serving.
Server-running Command:
Gets the prediction using this script _called inceptionclient.py:
I got the prediction fine! Successful. Awesome! Single image takes around 2.5seconds. Not bad.
Now I want to make the predictions for 1000 images, all together, using this --enable_batching flag.
The server started by adding this flag to the command, full server-starting command is:
But I don't know what should be the script or how to get the batch prediction to it. I tried searching but I could not find it.
The same client script doesn't work. Any Help! I want predictions for batch of images.