TFLite takes too long time when use unquanti mdoel.

z-huabao commented 6 years ago

System information

What is the top-level directory of the model you are using: classifier(mobilenet-v1-1.0)
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): ubuntu16.04
TensorFlow installed from (source or binary): source
TensorFlow version (use command below): 1.8.0
Bazel version (if compiling from source): 0.12.0
CUDA/cuDNN version: 9.1/7.1
GPU model and memory: GTX1080/8G
Exact command to reproduce:

Describe the problem

I am testing TFLite in Android(Raspberry Pi), use this code, it runs well. But when I change the quanti model to unquanti model, it takes too long time(300x300 input, about 9s), with correct results(classify). I test it in Pi2 and Pi3, Android Things, Android 5.1, Android 7.1, and all this env take long time. And I use adb shell top to monitor the CPU(raspberry i3), even when I use setNumThreads(4) in TFLite, it only use 25% CPU resource when inference. What's wrong? I have no idea, please help me! Thanks!

Source code / logs

public class ImageClassifierActivity extends Activity {
    ...
    private void doRecognize(Bitmap image) {
        // Allocate space for the inference results
        float[][] confidencePerLabel = new float[1][mLabels.size()];
        // Allocate buffer for image pixels.
        int[] intValues = new int[300 * 300];
        ByteBuffer imgData = ByteBuffer.allocateDirect(1*300*300*3* 4);
        imgData.order(ByteOrder.nativeOrder());

        // Read image data into buffer formatted for the TensorFlow model
        TensorFlowHelper.convertBitmapToByteBuffer_float32(image, intValues, imgData);

        // Run inference on the network with the image bytes in imgData as input,
        // storing results on the confidencePerLabel array.
        mTensorFlowLite.run(imgData, confidencePerLabel);

        // Get the results with the highest confidence and map them to their labels
        Collection<Recognition> results =
                TensorFlowHelper.getBestResults(confidencePerLabel, mLabels);
        // Report the results with the highest confidence
        onPhotoRecognitionReady(results);
    }

    ...
}

and convertBitmapToByteBuffer_float32:

 /** Writes Image data into a {@code ByteBuffer}. */
    public static void convertBitmapToByteBuffer_float32(Bitmap bitmap, int[] intValues, ByteBuffer imgData) {
        if (imgData == null) {
            return;
        }
        imgData.rewind();
        bitmap.getPixels(intValues, 0, bitmap.getWidth(), 0, 0,
                bitmap.getWidth(), bitmap.getHeight());
        // Encode the image pixels into a byte buffer representation matching the expected
        // input of the Tensorflow model
        final float IMAGE_STD = 128.0f;
        final int IMAGE_MEAN = 128;
        int pixel = 0;
        for (int i = 0; i < bitmap.getWidth(); ++i) {
            for (int j = 0; j < bitmap.getHeight(); ++j) {
                final int val = intValues[pixel++];
                imgData.putFloat(((val >> 16) & 0xFF) / IMAGE_STD - 1);
                imgData.putFloat(((val >> 8) & 0xFF) / IMAGE_STD - 1);
                imgData.putFloat((val & 0xFF) / IMAGE_STD - 1);
            }
        }
    }

qlzh727 commented 6 years ago

Assign to @sguada who is the owner of slim model

myth01 commented 6 years ago

My observation has been that using Tensorflow -> TFLite conversion tools from April 27 give way better performance. @zhonghuabao1 can you check this?

achowdhery commented 6 years ago

Please try this updated set of instructions: https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193

kiad4631 commented 5 years ago

System information

What is the top-level directory of the model you are using: classifier(mobilenet-v1-1.0)

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): ubuntu16.04

TensorFlow installed from (source or binary): source

TensorFlow version (use command below): 1.8.0

Bazel version (if compiling from source): 0.12.0

CUDA/cuDNN version: 9.1/7.1

GPU model and memory: GTX1080/8G

Exact command to reproduce:

Describe the problem

I am testing TFLite in Android(Raspberry Pi), use this code, it runs well. But when I change the quanti model to unquanti model, it takes too long time(300x300 input, about 9s), with correct results(classify). I test it in Pi2 and Pi3, Android Things, Android 5.1, Android 7.1, and all this env take long time. And I use adb shell top to monitor the CPU(raspberry i3), even when I use setNumThreads(4) in TFLite, it only use 25% CPU resource when inference. What's wrong? I have no idea, please help me! Thanks!

Source code / logs
public class ImageClassifierActivity extends Activity {
  ...
  private void doRecognize(Bitmap image) {
      // Allocate space for the inference results
      float[][] confidencePerLabel = new float[1][mLabels.size()];
      // Allocate buffer for image pixels.
      int[] intValues = new int[300 * 300];
      ByteBuffer imgData = ByteBuffer.allocateDirect(1*300*300*3* 4);
      imgData.order(ByteOrder.nativeOrder());

      // Read image data into buffer formatted for the TensorFlow model
      TensorFlowHelper.convertBitmapToByteBuffer_float32(image, intValues, imgData);

      // Run inference on the network with the image bytes in imgData as input,
      // storing results on the confidencePerLabel array.
      mTensorFlowLite.run(imgData, confidencePerLabel);

      // Get the results with the highest confidence and map them to their labels
      Collection<Recognition> results =
              TensorFlowHelper.getBestResults(confidencePerLabel, mLabels);
      // Report the results with the highest confidence
      onPhotoRecognitionReady(results);
  }

  ...
}
and convertBitmapToByteBuffer_float32:
 /** Writes Image data into a {@code ByteBuffer}. */
    public static void convertBitmapToByteBuffer_float32(Bitmap bitmap, int[] intValues, ByteBuffer imgData) {
        if (imgData == null) {
            return;
        }
        imgData.rewind();
        bitmap.getPixels(intValues, 0, bitmap.getWidth(), 0, 0,
                bitmap.getWidth(), bitmap.getHeight());
        // Encode the image pixels into a byte buffer representation matching the expected
        // input of the Tensorflow model
        final float IMAGE_STD = 128.0f;
        final int IMAGE_MEAN = 128;
        int pixel = 0;
        for (int i = 0; i < bitmap.getWidth(); ++i) {
            for (int j = 0; j < bitmap.getHeight(); ++j) {
                final int val = intValues[pixel++];
                imgData.putFloat(((val >> 16) & 0xFF) / IMAGE_STD - 1);
                imgData.putFloat(((val >> 8) & 0xFF) / IMAGE_STD - 1);
                imgData.putFloat((val & 0xFF) / IMAGE_STD - 1);
            }
        }
    }

Hi. When I put my ssdlite_mobilenetv2 float model instead of detect.tflite in asset folder my program crashs.do you know why?

tensorflow / models