tensorflow / model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
https://www.tensorflow.org/model_optimization
Apache License 2.0
1.49k stars 319 forks source link

Post quantization does not utilize GPU #454

Open ek9852 opened 4 years ago

ek9852 commented 4 years ago

System information

Motivation During post quantization , the GPU is idle (confirmed via nvidia-smi ), i.e. the post quantization is not using GPU to speed things up. It is very slow. It takes > 60 min to run on a server grade xeon (for test set of 2336 on our model):

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_dataset_gen():
  with tf.io.gfile.GFile(test_set, 'r') as f:
    test_list = f.readlines()
  for i in test_list:
    # Get sample input data as a numpy array
    with Image.open(os.path.join(datasetdir,  i).split()[0]) as img:
        yield [np.array(img).reshape(1,120,160,1).astype(np.float32)/255.0]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8  # or tf.int8
converter.inference_output_type = tf.uint8  # or tf.int8
tflite_quant_model = converter.convert()

Describe the feature post quantization should utilize GPU to speed things up.

miaout17 commented 4 years ago

A few initial questions:

I don't think the post training quantization tool supports GPU but I'm not the expert. I'll let @suharshs follow from here.

ek9852 commented 4 years ago

For "test set of 2336 on our model", does it mean 2336 images are used as representative dataset? Yes

Do you know how much time does it take to invoke the model? 0.3 sec on a Google Coral edge tpu. It should be much faster on my TITAN X nvidia gpu, But post-quantization does not use GPU currently.

suharshs commented 4 years ago

TensorFlow Lite doesn't currently support non-mobile GPU kernels, and the post-training quantization tool is specific to TensorFlow Lite at the moment. As we work to unify TensorFlow and TensorFlow Lite we will keep this in mind. I will keep this issue open to give you updates as they come.

Thanks!