onnx / tensorflow-onnx

Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX
Apache License 2.0
2.31k stars 433 forks source link

Prediction from onnx is same for all images(when darkflow is converted to onnx) #881

Closed VinuthaRaghavendra closed 4 years ago

VinuthaRaghavendra commented 4 years ago

Describe the bug A clear and concise description of what the bug is. Prediction from onnx is same for all images(when darkflow is converted to onnx) Urgency If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.

System information

To Reproduce Describe steps/code to reproduce the behavior:

Darkflow https://github.com/thtrieu/darkflow is used for training. Checkpoints are converted to onnx using tf2onnx Predictions on C# is done using https://docs.microsoft.com/en-us/windows/ai/windows-ml/convert-model-winmltools Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.

jignparm commented 4 years ago

Prediction from onnx is same for all images

Can you try OnnxRuntime to run inference on the ONNX model instead of WinMLTools? This will tell if the difference is caused by inference engine.

Also, is there a link to the frozen TF model you are using, and the TF2ONNX command line you used to convert the model?

https://github.com/microsoft/onnxruntime

Example using OnnxRuntime Python API

# Compute the prediction with ONNX Runtime
import onnxruntime as rt
import numpy
sess = rt.InferenceSession("rf_iris.onnx")
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name
pred_onx = sess.run([label_name], {input_name: X_test.astype(numpy.float32)})[0]
VinuthaRaghavendra commented 4 years ago

I tried OnnxRuntime Python API as well, prediction for all images are same.

Using code: import numpy as np import onnxruntime as rt from PIL import Image,ImageDraw sess = rt.InferenceSession("tiny_yolov2/model.onnx") input_name = sess.get_inputs()[0].name img = Image.open('test.jpg') img = img.resize((832, 832)) #for tiny_yolov2 X = np.asarray(img) X = X.transpose(2,0,1) X = X.reshape(1,3,832,832) out = sess.run(None, {input_name: X.astype(np.float32)}) out = out[0][0] numClasses = 2 anchors = [1.08, 1.19, 3.42, 4.41, 6.63, 11.38, 9.42, 5.11, 16.62, 10.52] def sigmoid(x, derivative=False): return x(1-x) if derivative else 1/(1+np.exp(-x))def softmax(x): scoreMatExp = np.exp(np.asarray(x)) return scoreMatExp / scoreMatExp.sum(0) clut = [(0,0,0),(255,0,0),(255,0,255),(0,0,255),(0,255,0),(0,255,128), (128,255,0),(128,128,0),(0,128,255),(128,0,128), (255,0,128),(128,0,255),(255,128,128),(128,255,128),(255,255,0), (255,128,128),(128,128,255),(255,128,128),(128,255,128)] label = ["aeroplane","tvmonitor"] draw = ImageDraw.Draw(img) for cy in range(0,13): for cx in range(0,13): for b in range(0,5): channel = b(numClasses+5) tx = out[channel ][cy][cx] ty = out[channel+1][cy][cx] tw = out[channel+2][cy][cx] th = out[channel+3][cy][cx] tc = out[cy][cx][channel+4] x = (float(cx) + sigmoid(tx))32 y = (float(cy) + sigmoid(ty))32

w = np.exp(tw) 32 anchors[2b ] h = np.exp(th) 32 anchors[2b+1]

confidence = sigmoid(tc)classes = np.zeros(numClasses) for c in range(0,numClasses): classes[c] = out[channel + 5 +c][cy][cx] classes = softmax(classes) detectedClass = classes.argmax() if 0.5< classes[detectedClass]*confidence: color =clut[detectedClass] x = x - w/2 y = y - h/2 draw.line((x ,y ,x+w,y ),fill=color) draw.line((x ,y ,x ,y+h),fill=color) draw.line((x+w,y ,x+w,y+h),fill=color) draw.line((x ,y+h,x+w,y+h),fill=color) img.save("result.png")

jignparm commented 4 years ago

Try this command line instead to generate the model. You should see different scores with different inputs (see below).

python -m tf2onnx.convert --input tiny-yolo-voc-3c.pb --inputs input:0 --outputs output:0 --opset 11 --output my.onnx

Example with 2 random samples

import sys
import onnxruntime as rt
import numpy as np
myfile = sys.argv[1]
sess = rt.InferenceSession(myfile)
input_name = sess.get_inputs()[0].name
output_name = sess.get_outputs()[0].name
shape = (1, 832, 832, 3)
sample = 255 * np.random.random(shape).astype(np.float32)
scores = sess.run([output_name], {input_name: sample})[0]
print ('scores1=\n',scores[0][0][0][:5])
sample = 255 * np.random.random(shape).astype(np.float32)
scores = sess.run([output_name], {input_name: sample  })[0]
print ('scores2=\n',scores[0][0][0][:5])

output

scores1=
 [-0.04436302 -0.02913633  0.09463985  0.34088922 -3.1391654 ]
scores2=
 [-0.04425232 -0.02944296  0.09664     0.34297347 -3.1432114 ]
VinuthaRaghavendra commented 4 years ago

But now, predictions from both graph are not matching. Can you please help

And one more doubt - Frozen graph exported from darkflow tiny-yolo-voc.weights has accuate results when converted to .onnx But frozen graph exported from checkpoints and converted to .onnx has complete mismatch of the predictions(always has confidence score equal to 99)

jignparm commented 4 years ago

But frozen graph exported from checkpoints and converted to .onnx has complete mismatch

Tf2onnx has --check_point parameter to load models from checkpoint format (instead of --input, which loads a frozen model instead). Did you use that argument to convert from checkpoint format, or something else?

If tf2onnx is able to convert a model, the model is typically accurate -- otherwise you'll see a glaring conversion error if it cannot handle it.

With the scores being equal to 99, one suspicion is that the input data is wrong -- can you double check the NCHW vs NHWC memory layout of your input vector? Another possibility is that the pixel values are out of range of valid values, which can produce extreme results for some models.

Another option, have a look at "Method 2" here https://github.com/onnx/tensorflow-onnx/issues/729#issuecomment-552058866 . This will convert the model and also compare the outputs of TF and Onnx models for accuracy in an easy-to-reproduce way.

VinuthaRaghavendra commented 4 years ago
  1. I have tried using --checkpoint parameter to freeze graph, but I get ValueError: Input 0 of node 0-convolutional_/cond_1/AssingMovingAvg/Switch was passed float from 0-convolutional/moving_mean:0 incompatible with expected float_ref

  2. I have used parameter --input_as_NCHW to convert input and also applied normalization further(Nan error was resolved).

  3. Can you please brief how to convert NHWC to NCHW in method 2

jignparm commented 4 years ago

For 1, "float_ref" usually occurs when there's a Variable in the checkpoint model file, and because Variables are not supported in Onnx, it's a halting condition for tf2onnx. You generally cannot proceed with the conversion unless you modify your TF model to avoid Variables in the frozen model.

For 3, there's no direct way to specify inputs_as_nchw in Method #2 (maybe that's useful to have in future) , but you can follow the code path and pass it as an argument into the function process_tf_graph()

jignparm commented 4 years ago

The drive.google.com link points to a top level folder called Inception_V2, but the folder is empty. Maybe the model needs to be shared as well?

jignparm commented 4 years ago

The model converts successfully for me using the master branch, opset 11, and TF ver 1.14.

Can you compare your run command with the one below?

python -m tf2onnx.convert --graphdef frozen_inference_graph.pb --output model.onnx \
    --fold_const --opset 11 --verbose\
    --inputs image_tensor:0 \
    --outputs num_detections:0,detection_boxes:0,detection_scores:0,detection_classes:0

...
2020-05-04 11:22:44.945899: I tensorflow/tools/graph_transforms/transform_graph.cc:317] Applying fold_old_batch_norms
2020-05-04 11:22:46,490 - INFO - tf2onnx: inputs: ['image_tensor:0']
2020-05-04 11:22:46,491 - INFO - tf2onnx: outputs: ['num_detections:0', 'detection_boxes:0', 'detection_scores:0', 'detection_classes:0']
2020-05-04 11:22:46,906 - INFO - tf2onnx.tfonnx: Using tensorflow=1.14.0, onnx=1.6.0, tf2onnx=1.6.0/82f805
2020-05-04 11:22:46,906 - INFO - tf2onnx.tfonnx: Using opset <onnx, 11>
...
2020-05-04 11:23:12,389 - INFO - tf2onnx.optimizer: After optimization: Cast -76 (265->189), Const -561 (993->432), Div -4 (29->25), Flatten -1 (2->1), Gather +2 (41->43), Identity -75 (76->1), Less -5 (26->21), Mul -3 (59->56), ReduceMean -1 (2->1), Shape -6 (54->48), Slice -7 (102->95), Squeeze -21 (129->108), Transpose -168 (192->24), Unsqueeze -50 (125->75)
2020-05-04 11:23:12,666 - INFO - tf2onnx:
2020-05-04 11:23:12,667 - INFO - tf2onnx: Successfully converted TensorFlow model frozen_inference_graph.pb to ONNX
2020-05-04 11:23:12,815 - INFO - tf2onnx: ONNX model is saved at model.onnx
VinuthaRaghavendra commented 4 years ago

But using same command I get value error:

File "/home/falconeye-ai/Documents/tensorflow-onnx-master/tf2onnx/onnx_opset/nn.py", line 451, in version_11 pads = node.inputs[1].get_tensor_value() File "/home/falconeye-ai/Documents/tensorflow-onnx-master/tf2onnx/graph.py", line 260, in get_tensor_value raise ValueError("get tensor value: {} must be Const".format(self.name)) ValueError: get tensor value: SecondStagePostprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/stack_2_Concat2432 must be Const Traceback (most recent call last): File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "main__", mod_spec) File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/falconeye-ai/Documents/tensorflow-onnx-master/tf2onnx/convert.py", line 161, in main() File "/home/falconeye-ai/Documents/tensorflow-onnx-master/tf2onnx/convert.py", line 145, in main inputs_as_nchw=args.inputs_as_nchw) File "/home/falconeye-ai/Documents/tensorflow-onnx-master/tf2onnx/tfonnx.py", line 577, in process_tf_graph raise exceptions[0] File "/home/falconeye-ai/Documents/tensorflow-onnx-master/tf2onnx/tfonnx.py", line 354, in tensorflow_onnx_mapping func(g, node, **kwargs) File "/home/falconeye-ai/Documents/tensorflow-onnx-master/tf2onnx/onnx_opset/nn.py", line 451, in version_11 pads = node.inputs[1].get_tensor_value() File "/home/falconeye-ai/Documents/tensorflow-onnx-master/tf2onnx/graph.py", line 260, in get_tensor_value raise ValueError("get tensor value: {} must be Const".format(self.name)) ValueError: get tensor value: BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/stack_Concat__1756 must be Const

jignparm commented 4 years ago

Are you still using Python 3.5?

Python 3.5 is not supported by tensorflow-onnx. Can you re-try with Python 3.6 or 3.7?

jignparm commented 4 years ago

Assuming this is no longer an issue, since the model above converted successfully in a test run. Reopen if needed.