google / automl

Google Brain AutoML
Apache License 2.0
6.24k stars 1.45k forks source link

Inference Differences #933

Open ghost opened 3 years ago

ghost commented 3 years ago

Hello,

i trained a model on my own dataset. It worked pretty good so far.

After training my goal is inference on the test set, but according to the README there are two ways:

1) export frozen graph model + "model_inspect.py --runmode=saved_model_infer" 2) direct inference on checkpoint "model_inspect.py --runmode=saved_model_infer"

Both methods give me reasonable detections, BUT: 1) gives me less detections. Only detections with higher score (~>0.5) 2) gives me much more detections with score higher 0.1, which was me specified threshold. So method 2) leads to my expected result.

Why is method 1) different? I need the frozen graph as .pb file for further inference. But with a model which gives me all detections...

Thanks a lot in advance

Steve-2040 commented 3 years ago

@cgeller if you have exported the model as saved model, then you can also use something like the flowing code for inference: (test images are labeled img0.jpg to img99.jpg in directory test_inf and saved model is in savedmodeldir)

import tensorflow as tf
from PIL import Image, ImageOps
import numpy as np
import os

# os.environ['CUDA_VISIBLE_DEVICES'] = "-1" # uncomment to run on CPU

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    tf.config.experimental.set_memory_growth(gpus[0], True)
print(len(gpus), "Physical GPUs")

imported = tf.saved_model.load('savedmodeldir/')
f = imported.signatures["serving_default"]

def get_image_array(im_id):
    im = Image.open(f"test_inf/img{im_id}.jpg")
    im_arr = np.frombuffer(im.tobytes(), dtype=np.uint8)
    im_arr = im_arr.reshape((1, im.size[1], im.size[0], 3))
    return im_arr

for i in range(0, 100):
    im_arr = get_image_array(i)
    detections = f(tf.constant(im_arr))['detections:0'].numpy()
    detections = detections[0] # num detections:100 (Update comment)
    for x in range(len(detections)):
        y1 = detections[x][1]
        x1 = detections[x][2]
        y2 = detections[x][3]
        x2 = detections[x][4]
        pr = int(detections[x][5]*100.0) # probability
        cl = int(detections[x][6])       # class_id
        if pr > 50:
            print(f"img{i}.jpg class_id:{cl} probability%:{pr} y1:{y1} x1:{x1} y2:{y2} x2:{x2}")

to get the model inputs and output you use: (from what I have seen, it looks like the saved model always output 100 detections with the highest probability first)

saved_model_cli show --dir savedmodeldir --all

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['image_arrays:0'] tensor_info:
        dtype: DT_UINT8
        shape: (-1, -1, -1, -1)
        name: image_arrays:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['detections:0'] tensor_info:
        dtype: DT_FLOAT
        shape: (1, 100, 7)
        name: detections:0
  Method name is: tensorflow/serving/predict

I also have used TensorFlow Serving for inference using a EfficientDet exported saved model. https://www.tensorflow.org/tfx/guide/serving

Hope this helps.

ghost commented 3 years ago

Thanks for your quick response and the script. The script gives me the exact same detections as the save_mode_infer method on a saved model. My main problem is still, that the detections of a saved_model and the direct detection with the python model_inspect.py --runmode=infermethod are different. In both cases I defined the min_score_thresh=0.1

This are the detection results with the direct infer method. Looks like all detections over 0.1 are displayed. Bildschirmfoto 2021-01-30 um 13 46 05

This are the detections on a saved model, both with your provided script but also the provided method. Some detections are missing. The threshold is also on 0.1, so in my opinion they should also be printed. In fact, there are also images, where also the high score detections vary between the two methods. Bildschirmfoto 2021-01-30 um 13 46 25

Is it a bug? Or why is the result different? I need ALL detections also from the saved model. Thanks a los in advance

aakash665 commented 3 years ago

Nice

On Sat, 30 Jan, 2021, 6:22 pm Christian Geller, notifications@github.com wrote:

Thanks for your quick response and the script. The script gives me the exact same detections as the save_mode_infer method on a saved model. My main problem is still, that the detections of a saved_model and the direct detection with the python model_inspect.py --runmode=infermethod are different. In both cases I defined the min_score_thresh=0.1

This are the detection results with the direct infer method. Looks like all detections over 0.1 are displayed. [image: Bildschirmfoto 2021-01-30 um 13 46 05] https://user-images.githubusercontent.com/57139976/106356705-e4d1bd80-6301-11eb-9d26-66ce68c753e8.png

This are the detections on a saved model, both with your provided script but also the provided method. Some detections are missing. The threshold is also on 0.1, so in my opinion they should also be printed. In fact, there are also images, where also the high score detections vary between the two methods. [image: Bildschirmfoto 2021-01-30 um 13 46 25] https://user-images.githubusercontent.com/57139976/106356715-e9967180-6301-11eb-9194-b402cf2fa847.png

Is it a bug? Or why is the result different? I need ALL detections also from the saved model. Thanks a los in advance

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/automl/issues/933#issuecomment-770207953, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARKUEDA6WZ4OIPEPSPEAILDS4P6INANCNFSM4WZSVGFQ .

Steve-2040 commented 3 years ago

I have seen the accuracy improves when using moving_average_decay: "With testing it looks like the saved_model without 'moving_average_decay: 0' (158 MB) has better precision and the inference speed of the two models are the same." https://github.com/google/automl/issues/799#issue-707310584 I have seen variances between the methods, as you described, but haven't investigate it further.

ghost commented 3 years ago

Hey @Steve-2040,

moving_average_decay does not changed anything in my case.

I just noticed the differences in the two methods. Can anybody explain the differences or help to find them?

Thanks a lot

fitoule commented 3 years ago

are you certain to use --min_score_thresh=0.1 ? Try with --min_score_thresh=0.8 to see if the parameter works. for my data it works but I don't go under 0.4 which is a very low.

kevinkvothe commented 3 years ago

@cgeller Same thing happened to me. The problem is that when you export a model, the min_threshold is hardcoded in it. Changing the following default values from 0.4 to 0.0 solved the issue for me:

efficientdet/keras/infer.py line 51 efficientdet/keras/inspector.py line 116 and line 172 efficientdet/keras/train_lib.py line 384

Unfortunately I haven't been able to identify exactly which line is hardcoding the config into the model.

achukhrov-ffr-team commented 3 years ago

@cgeller if you have exported the model as saved model, then you can also use something like the flowing code for inference: (test images are labeled img0.jpg to img99.jpg in directory test_inf and saved model is in savedmodeldir)

import tensorflow as tf
from PIL import Image, ImageOps
import numpy as np
import os

# os.environ['CUDA_VISIBLE_DEVICES'] = "-1" # uncomment to run on CPU

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    tf.config.experimental.set_memory_growth(gpus[0], True)
print(len(gpus), "Physical GPUs")

imported = tf.saved_model.load('savedmodeldir/')
f = imported.signatures["serving_default"]

def get_image_array(im_id):
    im = Image.open(f"test_inf/img{im_id}.jpg")
    im_arr = np.frombuffer(im.tobytes(), dtype=np.uint8)
    im_arr = im_arr.reshape((1, im.size[1], im.size[0], 3))
    return im_arr

for i in range(0, 100):
    im_arr = get_image_array(i)
    detections = f(tf.constant(im_arr))['detections:0'].numpy()
    for x in range(len(detections)):
        y1 = detections[x][1]
        x1 = detections[x][2]
        y2 = detections[x][3]
        x2 = detections[x][4]
        pr = int(detections[x][5]*100.0) # probability
        cl = int(detections[x][6])       # class_id
        if pr > 50:
            print(f"img{i}.jpg class_id:{cl} probability%:{pr} y1:{y1} x1:{x1} y2:{y2} x2:{x2}")

to get the model inputs and output you use: (from what I have seen, it looks like the saved model always output 100 detections with the highest probability first)

saved_model_cli show --dir savedmodeldir --all

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['image_arrays:0'] tensor_info:
        dtype: DT_UINT8
        shape: (-1, -1, -1, -1)
        name: image_arrays:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['detections:0'] tensor_info:
        dtype: DT_FLOAT
        shape: (1, 100, 7)
        name: detections:0
  Method name is: tensorflow/serving/predict

I also have used TensorFlow Serving for inference using a EfficientDet exported saved model. https://www.tensorflow.org/tfx/guide/serving

Hope this helps.

Could you explain please, why you don't normalizing and inference works well?

Steve-2040 commented 3 years ago

@achukhrov-ffr-team Inference works well for me on the images from the camera unchanged. I have tried CLAHE (Adaptive histogram equalization) but it did not help much in my case. Can you suggest another method?

mingxingtan commented 3 years ago

Hi @cgeller , could you paste your complete command line for these two approaches?

One potential issues is that when you export the saved model, it already uses the max_instance and min_score, so even if you set lower min_score at inference, it wouldn't help. From your pictures, it is likely that you use a larger min_score_thresh when exporting the saved_graph.

achukhrov-ffr-team commented 3 years ago

@achukhrov-ffr-team Inference works well for me on the images from the camera unchanged. I have tried CLAHE (Adaptive histogram equalization) but it did not help much in my case. Can you suggest another method?

Hi @Steve-2040 I think, there is some misunderstanding My question was: as far as I can see, you don't dont preprocessing of images in your code (you don't divide by 255 and don't substract mean and don't divide by std) nevertheless it works good, why? Or this preprocessing is hidden somewhere, if it is true, then where it is hidden?

Steve-2040 commented 3 years ago

Hi @achukhrov-ffr-team for the saved model the input is: inputs['image_arrays:0'] tensor_info: dtype: DT_UINT8. The DT_UINT8 is unsigned int 8 so that is values 0 to 255.

achukhrov-ffr-team commented 3 years ago

@Steve-2040 Yes, but in train stage images (which were represented as same arrays with values from 0 to 255) , were preprocessed, specifically, divided by 255 in order to have every value between 0 and 1 and then normalized (subtraction of mean and division by std), and you haven't done that My question is: is it true that you haven't done it or I just can't see it? And if you haven't done it then why? And why it works?

Steve-2040 commented 3 years ago

Hi @achukhrov-ffr-team My knowledge is limited on the model itself. I think if it had this pre-processing for training then it may have that in the inference graph as well? For interesting sake, I have tried to open the saved_model graph using import_pb_to_tensorboard (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/import_pb_to_tensorboard.py) but this did not work for me (tenserboard hang on "namespace hierarchy: finding similar subgraphs"). looking at the graph in tensorboard may show the pre-processing? You may have better luck to get import_pb_to_tensorboard to work. It will be best if you ask a new question in automl issues with a more descriptive title to get better answers.

td43 commented 2 years ago

@cgeller if you have exported the model as saved model, then you can also use something like the flowing code for inference: (test images are labeled img0.jpg to img99.jpg in directory test_inf and saved model is in savedmodeldir)

import tensorflow as tf
from PIL import Image, ImageOps
import numpy as np
import os

# os.environ['CUDA_VISIBLE_DEVICES'] = "-1" # uncomment to run on CPU

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    tf.config.experimental.set_memory_growth(gpus[0], True)
print(len(gpus), "Physical GPUs")

imported = tf.saved_model.load('savedmodeldir/')
f = imported.signatures["serving_default"]

def get_image_array(im_id):
    im = Image.open(f"test_inf/img{im_id}.jpg")
    im_arr = np.frombuffer(im.tobytes(), dtype=np.uint8)
    im_arr = im_arr.reshape((1, im.size[1], im.size[0], 3))
    return im_arr

for i in range(0, 100):
    im_arr = get_image_array(i)
    detections = f(tf.constant(im_arr))['detections:0'].numpy()
    for x in range(len(detections)):
        y1 = detections[x][1]
        x1 = detections[x][2]
        y2 = detections[x][3]
        x2 = detections[x][4]
        pr = int(detections[x][5]*100.0) # probability
        cl = int(detections[x][6])       # class_id
        if pr > 50:
            print(f"img{i}.jpg class_id:{cl} probability%:{pr} y1:{y1} x1:{x1} y2:{y2} x2:{x2}")

to get the model inputs and output you use: (from what I have seen, it looks like the saved model always output 100 detections with the highest probability first)

saved_model_cli show --dir savedmodeldir --all

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['image_arrays:0'] tensor_info:
        dtype: DT_UINT8
        shape: (-1, -1, -1, -1)
        name: image_arrays:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['detections:0'] tensor_info:
        dtype: DT_FLOAT
        shape: (1, 100, 7)
        name: detections:0
  Method name is: tensorflow/serving/predict

I also have used TensorFlow Serving for inference using a EfficientDet exported saved model. https://www.tensorflow.org/tfx/guide/serving

Hope this helps.

Hi there, I have been trying to use your code, but I got this error:


detections:  [[[0.0000000e+00 1.5822080e+03 9.3178455e+02 ... 1.2326713e+03
   8.7507027e-01 2.5000000e+01]
  [0.0000000e+00 1.5473186e+03 6.5615771e+02 ... 9.2950134e+02
   8.3741075e-01 1.2200000e+02]
  [0.0000000e+00 4.9753842e+02 7.1269232e+02 ... 8.2799677e+02
   8.1300539e-01 9.1000000e+01]
  ...
  [0.0000000e+00 7.5303186e+02 8.1928699e+02 ... 8.8794812e+02
   8.1605883e-03 3.3000000e+01]
  [0.0000000e+00 2.4212416e+02 9.1091125e+02 ... 1.0349702e+03
   8.1111044e-03 1.2400000e+02]
  [0.0000000e+00 0.0000000e+00 6.3821063e+02 ... 8.2343903e+02
   8.0806054e-03 6.8000000e+01]]]

Traceback (most recent call last):
  File "confusion_matrix_tf2.py", line 344, in <module>
    tf.compat.v1.app.run()
  File "/home/daniel_tobon/cm-env/lib/python3.8/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/daniel_tobon/cm-env/lib/python3.8/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/home/daniel_tobon/cm-env/lib/python3.8/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "confusion_matrix_tf2.py", line 334, in main
    confusion_matrix = process_detections(input_tfrecord_path, model, categories, draw_option, draw_save_path)
  File "confusion_matrix_tf2.py", line 150, in process_detections
    pr = int(detections[x][5] * 100.0)  # probability
TypeError: only size-1 arrays can be converted to Python scalars
Steve-2040 commented 2 years ago

Hi @danielTobon43 there was a line missing in the code you can try this: line missing: (detections = detections[0] # num detections:100)

import tensorflow as tf
from PIL import Image, ImageOps
import numpy as np
import os

# os.environ['CUDA_VISIBLE_DEVICES'] = "-1" # uncomment to run on CPU

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    tf.config.experimental.set_memory_growth(gpus[0], True)
print(len(gpus), "Physical GPUs")

imported = tf.saved_model.load('savedmodeldir/')
f = imported.signatures["serving_default"]

def get_image_array(im_id):
    im = Image.open(f"test_inf/img{im_id}.jpg")
    im_arr = np.frombuffer(im.tobytes(), dtype=np.uint8)
    im_arr = im_arr.reshape((1, im.size[1], im.size[0], 3))
    return im_arr

for i in range(0, 100):
    im_arr = get_image_array(i)
    detections = f(tf.constant(im_arr))['detections:0'].numpy()
    detections = detections[0] # num detections:100
    for x in range(len(detections)):
        y1 = detections[x][1]
        x1 = detections[x][2]
        y2 = detections[x][3]
        x2 = detections[x][4]
        pr = int(detections[x][5]*100.0) # probability
        cl = int(detections[x][6])       # class_id
        if pr > 50:
            print(f"img{i}.jpg class_id:{cl} probability%:{pr} y1:{y1} x1:{x1} y2:{y2} x2:{x2}")

Hope this helps also see this: https://github.com/google/automl/issues/753#issuecomment-689353891

td43 commented 2 years ago

Hi @Steve-2040 Now it is working perfectly, thanks! Screenshot from 2021-11-04 06-13-25 .

I will be using this code to create the confusion matrix for object detection. I will be asking you to help me If I get stuck.

Thanks!!

td43 commented 2 years ago

@Steve-2040 One question. This code will be similar for the frozen_model.pb, right? I mean, the loading model part!

Steve-2040 commented 2 years ago

@Steve-2040 One question. This code will be similar for the frozen_model.pb, right? I mean, the loading model part!

@danielTobon43 there are some differences. I do not know the details but you can have a look at this: https://stackoverflow.com/questions/51226011/tensorflow-frozen-graph-to-savedmodel