Onnx -> Tensorflow -> Tflite results in a bugged conversion (and 5x slower)

pablovela5620 commented 3 years ago

Describe the bug

I converted an onnx model and got good numerical results but the inference speed is 5x slower as well as some issues when I visualized the netron graph (I attached below)

To Reproduce I used the following function in jupyter notebooks to convert from onnx -> tensorflow -> tflite

def onnx2tflite(onnx_path, tflite_path):
    tflite_parent_dir = str(tflite_path.parent)

    onnx_model = onnx.load(onnx_path)  # load onnx model
    tf_rep = prepare(onnx_model, device='CPU')  # prepare tf representation
    tf_rep.export_graph(tflite_parent_dir)  # export the model
    # Convert the model
    converter = tf.lite.TFLiteConverter.from_saved_model(tflite_parent_dir) # path to the SavedModel directory
    tflite_model = converter.convert()

    # Save the model.
    with open(str(tflite_path), 'wb') as f:
        f.write(tflite_model)

    return tflite_model

ONNX model file

drive link

Python, ONNX, ONNX-TF, Tensorflow version

Python version: 3.8.10
ONNX version: 1.8.0
ONNX-TF version: 1.8.0
Tensorflow version: 2.5.0

Additional context When converted the tf and tflite models both output correct values that are numerically close to the outputs from the originally onnx model I visualized the graphs in netron and got the following

The original onnx (which is fine and works as expected) original_onnx_netron

The converted tflite model (as you can tell there's something super wrong here) tflite_export_netron

it seems like its duplicating a conv2d module a bunch of times when it definitely shouldn't be.

Any help would be appreciated

seanshpark commented 3 years ago

Maybe apply https://github.com/onnx/onnx-tensorflow/pull/905/files patch to your installation.

I don't know how to do this in official way but for temporary hacking your local installation, please try this

1) identify your installation

$ python3 -c 'import sysconfig; print(sysconfig.get_paths()["purelib"])'
/usr/lib/python3.6/site-packages

2) edit and apply the patch to onnx_tf/handlers/backend/conv_mixin.py source file

$ vi /usr/lib/python3.6/site-packages/onnx_tf/handlers/backend/conv_mixin.py

jump to the line about 101 and change the code

3) run conversion again

pablovela5620 commented 3 years ago

Looks like thats helped me make some progress! After some benchmarking, it looks like its still about 2x slower compared to the original onnx model. Not sure if this is expected. There seems to be a bunch of Transposes that aren't in the original onnx graph. new_tflite_netron

seanshpark commented 3 years ago

Looks like thats helped me make some progress!

Good to hear this :)

There seems to be a bunch of Transposes that aren't in the original onnx graph.

I guesss, as ONNX model input starts from NCHW, and with some kernels work only in NHWC in TFlite, like Conv2D, they are surrounded with transpose ops to match shape, which makes the whole execution speed slower.

chinhuang007 commented 3 years ago

The patch is merged into the master branch so you could run the following to install the latest pip install -e git+https://github.com/onnx/onnx-tensorflow.git#egg=onnx_tf

chinhuang007 commented 3 years ago

The transposes are added to convert NCHW to NHWC. It should be not a surprise that ONNX model runs faster in ONNX runtime or any other runtime that supports NCHW because the data doesn't have to go through data format conversion.

pablovela5620 commented 3 years ago

Ah I didn't realize that Tflite conv2d only supported NCHW inputs, that makes more sense as to why it's still slower. I mostly work with Pytorch so I wanted to ask for some advice on the options to try to match the inference performance. Would I basically need to retrain the model to work with NHWC via this? Or is there another way without needing to retrain the model?

seanshpark commented 3 years ago

Tflite conv2d only supported NCHW inputs

Ah, it's NHWC

Would I basically need to retrain the model to work with NHWC via this?

I don't know much about it but it would be good to give it a try :) Just to check performance, try with initial network without training may save your time...

pablovela5620 commented 3 years ago

Understood, thank you both for all the help!

onnx / onnx-tensorflow

Onnx -> Tensorflow -> Tflite results in a bugged conversion (and 5x slower) #921