Post-training integer quantization for SSDMobileNet - poor detection accuracy

Garstig commented 2 years ago

Hi there

Describe the bug I used Convert TF Object Detection API model to TFLite and tried to modify it, so it is fully int 8 quantized. Here is my version. It should completely run on its own in Colab.

In the notebook I use two different functions to create the different representative datasets. The first one uses 200 images from the coco dataset, the second one only 2. However, the resulting models seem to perform quite similarly poorly. Therefore I guess I did something wrong but can't find the mistake.

You can see the results of the models at the very end of the colab but I also will add some Screenshots.

System information

I didn't change the standard Installation on google Colab.

Python version:

Describe the expected behavior

The first generated tflite model should perform much better than the second and detect all easy cases.

Describe the current behavior They both perform bad.

Code to reproduce the issue Here is my version. It should completely run on its own on Colab.

Screenshots

Model 2 outperforms Model 1

Result Model 1

Result Model 2

Both perform okay

Result Model 1

Result Model 2

Both perform poorly

Result Model 1 & 2

Additional context If you have any questions about my code don't hesitate to ask! Thank you very much.

dansuh17 commented 2 years ago

Hi @Garstig, thanks for the detailed description.

One quick question about the model in the examples: is Model 2 the model quantized with 200 representative samples?

Selecting the representative dataset is a subtle art. Could you try the following and see if the performance improves?

Use a subset of the training dataset (instead of the validation dataset) for the representative dataset.
Use a little more than 200 samples (512, for instance).
Shuffle the files before selecting the samples. I'm not sure how COCO dataset looks like, but if the files are ordered somehow and you're only selecting the first 200, they may not be representing the whole spectrum of possible inputs.

Garstig commented 2 years ago

Hi @dansuh17, thanks for your answer!

Model2 was the model with only 2 representative samples. So yes, there were cases were the model with less representative samples performed better than the one with more.

I just tried everything you suggested. Unfortunately the results are still quite bad. I even used the same images for testing as for the representative dataset, as I thought it should work best for them.

The updated colab notebook can be found here. The tflite models can found here together with the images I used (random sample of the coco train dataset. My notebook downloads them from google drive, so you don't need to download 18gb of files).

Do you know if the function for creating my representative dataset is right?

def _representative_dataset_gen_full():
    root = 'images/repr/'
    pattern = "*.jpg"
    imagePaths = []
    for path, subdirs, files in os.walk(root):
        for name in files:
            if fnmatch(name, pattern):
                imagePaths.append(root + name)        
    for index,p in enumerate(imagePaths): 
            if index % 50 == 0:
              print(index)
            image = cv2.imread(p)
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) 
            image = cv2.resize(image, (640, 640)) 
            image = image.astype("float") 
            image = np.expand_dims(image, axis=1) 
            image = image.reshape(1, 640, 640, 3)
            #minus mean and devide by std
            image = (image - 127.5)/127.5
            yield [image.astype("float32")]

Garstig commented 2 years ago

Hi @dansuh17 , do you have any chance too look into it again? Or do you know someone who could help? I would really appreciate that!

rino20 commented 2 years ago

Your representative dataset generation function is not aligned with your preprocess_image_cv() function.

especially, I think image = image * 255 is missing. what about just call the preprocess_image_cv() in _representative_dataset_gen_full ?

Garstig commented 2 years ago

Hi @rino20 and thanks for helping!

I guess I didn't use the preprocess_image_cv() methode because I wanted to try different combinations of preprocessing as I was afraid TFLite might do some stuff internally or something..

However I wrote a version of a _representative_dataset_gen_full which uses preprocess_image_cv. Unfortunately I did not see any change in the behavior :/

If someone wants to reproduce my results, here is the stuff I changed in the code.

  def _representative_dataset_gen_full():
      root = 'images/repr/'
      pattern = "*.jpg"
      imagePaths = []
      for path, subdirs, files in os.walk(root):
          for name in files:
              if fnmatch(name, pattern):
                  imagePaths.append(root + name)        
      for index,p in enumerate(imagePaths): 
              if index % 50 == 0:
                print(index)
              image = preprocess_image_cv(p)[0]
              image = np.expand_dims(image, axis=1) 
              image = image.reshape(1, 640, 640, 3)
              yield [image.astype("float32")]

I also changed the preprocess function to make it work with just one argument:

def preprocess_image_cv(image_path, input_size = (640,640)):
        """Preprocess the input image to feed to the TFLite model"""
        image = cv2.imread(image_path)
        original_image = image.copy()
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) 
        image = cv2.resize(image, input_size) 
        image = (image-127.5)/127.5
        image = image * 255
        preprocessed_image = image.astype(np.uint8) 
        return preprocessed_image, original_image

Garstig commented 2 years ago

Whoops, I closed it by accident. I just wanted to comment.

Garstig commented 1 year ago

@dansuh17 and @rino20 :

Does one of you have any more ideas or know someone who could help?

tensorflow / model-optimization

Post-training integer quantization for SSDMobileNet - poor detection accuracy #1002