onnx / tensorflow-onnx

Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX
Apache License 2.0
2.33k stars 432 forks source link

Turning off back-to-back optimizer does not disable fusing batch normalization layers into convolutional layers #1929

Open Mypathissional opened 2 years ago

Mypathissional commented 2 years ago

Describe the bug Hi, I was converting CenterNet(CenterNet HourGlass104 512x512 from Tensorflow Object Detection API(https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md) with the Back-to-back optimizer turned off to disable the batchnorm fusion into conv layers following https://github.com/onnx/tensorflow-onnx/issues/1702 . The problem is that even though the back-to-back optimizer is turned off the convolutions and batchnorms are still fused together. Where else optimization can occur? Using tensorflow=2.8.0, onnx=1.11.0, tf2onnx=1.9.3/1190aa and opset 15

image
hwangdeyu commented 2 years ago

The fusion logic of Conv and BatchNormalization do in back-to-back optimizer. Could you check if your model conversion process is passed through the below code? https://github.com/onnx/tensorflow-onnx/blob/c67bcfb580be741ece8d9978a9b57bd2ce7367ee/tf2onnx/optimizer/back_to_back_optimizer.py#L191

Mypathissional commented 2 years ago

@hwangdeyu After i turned off the back-to-back optimizer in the beginning of init file, I am printing a message both at the beginning of the _optimize_conv_batchnorm_fusion(g, node, consumer_nodes) and in optimize_graph(graph, catch_errors=True, optimizers=None) in optimizer init file. It is entering optimize_graph but not _optimize_conv_batchnorm_fusion and some kind of fusion is still happening because the node name is changed.

Mypathissional commented 2 years ago

i guess the problem might be not in the fusing but in the type of the batchnormalization layer used which is SyncBatchNormalization. I have prepared a minimal example for it. For this code the exported model does not have the batchnorms

strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
  net = tf.keras.Sequential()
  net.add(tf.keras.layers.Conv2D(2,3))
  net.add(tf.keras.layers.experimental.SyncBatchNormalization())
net.build((1,30,30,2))
net.save("~/Desktop/conv_block")

Can this be a problem?

image
hwangdeyu commented 2 years ago

Hi @Mypathissional , I think it's a expected behavior for tensorflow-onnx.

Cause when I do the convert script, there is no BatchNormalization op, even before running the optimizer conversion.

optimizer before: Counter({'Identity': 7, 'Const': 2, 'Transpose': 2, 'Placeholder': 1, 'Conv': 1, 'Mul': 1})
optimizer after: Counter({'Transpose': 2, 'Placeholder': 1, 'Const': 1, 'Conv': 1})

However, if we change tf.keras.layers.experimental.SyncBatchNormalization() to tf.keras.layers.BatchNormalization(), the op would be shown.

optimizer before: Counter({'Identity': 6, 'Const': 5, 'Transpose': 4, 'Placeholder': 1, 'Conv': 1, 'BatchNormalization': 1})
Mypathissional commented 2 years ago

@hwangdeyu Can you tell just for my understanding what happens when the operation that is present in the saved model but not present in the onnx operations? Is this operation just getting skipped?

hwangdeyu commented 2 years ago

@hwangdeyu Deyu Huang FTE Can you tell just for my understanding what happens when the operation that is present in the saved model but not present in the onnx operations? Is this operation just getting skipped?

I don't know how experimental.SyncBatchNormalization() works with deep implementation. From what I've seen so far, the op is not presented in save model neither. There is a FusedBatchNormV3 in tf.keras.layers.BatchNormalization() saved model ops:

 ['Placeholder', 'Const', 'Const', 'Const', 'Const', 'Const', 'Const', 'Identity', 'NoOp', 'NoOp', 'Conv2D', 'NoOp', 'Identity', 'FusedBatchNormV3', 'Identity', 'Identity', 'Identity']

But it's missing this op in tf.keras.layers.experimental.SyncBatchNormalization() saved model ops:

['Placeholder', 'Const', 'Const', 'Const', 'Const', 'Const', 'Const', 'Const', 'Const', 'Identity', 'NoOp', 'NoOp', 'Conv2D', 'NoOp', 'Identity', 'Mul', 'Identity', 'Identity', 'Identity', 'Identity']