Get to know the recent status of TF2ONNX

zhenhuaw-me commented 4 years ago

https://github.com/onnx/tensorflow-onnx

[x] Overall status
[x] What the output of it's transpose based layout handling in reality.
[x] The quantization support of it.
[x] TensorFlow 2.0 support

zhenhuaw-me commented 4 years ago

Overall status (readme)

ONNX: opset-6 to opset-12, default to opset-8. --opset can specify. Managing opset compatibility (example).
TF version: TF 1.x, experimental support for TF 2.x.
Model: RNN convert (tips). (all ops)
Input type: SavedModel, GraphDef (ProtoBuf), and checkpoint
Custom Op supported (example). How it works?

How tf2onnx works

For many ops TensorFlow passes parameters like shapes as inputs where ONNX wants to see them as attributes. Since we use a frozen graph, the converter will fetch the input as constant, converts it to an attribute and remove the original input.
TensorFlow in many cases composes ops out of multiple simpler ops. The converter will need to identify the subgraph for such ops, slice the subgraph out and replace it with the ONNX equivalent. This can become fairly complex so we use a graph matching library for it. A good example of this is the tensorflow transpose op.
TensorFlow's default data format is NHWC where ONNX requires NCHW. The converter will insert transpose ops to deal with this.
There are some ops like relu6 that are not supported in ONNX but the converter can be composed out of other ONNX ops.
ONNX backends are new and their implementations are not complete yet. For some ops the converter generate ops with deal with issues in existing backends.

zhenhuaw-me commented 4 years ago

Transpose based layout handling

tf2onnx has a reasonable handling of layout - all in all, it has ~10 active developers. In one sentance, it's very like our propagation based approach with input/output shape unchanged when comparing the original model and output model - inserting Transpose there. It could be inserting Transpose operators first, and then remove most of them during graph optimization.

A mobilenet v2 example (official Google MobileNetV2 model) after convert is as below.

python -m tf2onnx.convert --graphdef ./mobilenet_v2_1.0_224/mobilenet_v2_1.0_224_frozen.pb \
  --output mobilenet_v2.onnx --inputs input:0[1,224,224,3] --outputs MobilenetV2/Predictions/Softmax:0

The Transpose for input.

Further more, tf2onnx performs graph optimization such as fusing batch normalization and bias add into convolution. Better than what we expected before.

zhenhuaw-me commented 4 years ago

Quantization support

tf2onnx doesn't support generate quantized ONNX model from TensorFlow models.

Converted official quantization-aware training mobilenet v1 model.

Given a quantiation-aware training tensorflow model, tf2onnx:

For weights related quantization operators, then are fused and computed, such that FakeQuantWithMinMaxVars are removed.

For FakeQuantWithMinMaxVars of activations, tf2onnx takes FakeQuantWithMinMaxVars operators as custom operators, with which other tools can continue to work on Quantization-aware training support. Errors also reported, but we have the ONNX model still.

2020-07-13 20:00:35,570 - ERROR - Tensorflow op [MobilenetV1/MobilenetV1/Conv2d_0/act_quant/FakeQuantWithMinMaxVars: FakeQuantWithMinMaxVars] is not supported
2020-07-13 20:00:35,578 - ERROR - Tensorflow op [MobilenetV1/MobilenetV1/Conv2d_1_depthwise/act_quant/FakeQuantWithMinMaxVars: FakeQuantWithMinMaxVars] is not supported
......
2020-07-13 20:00:36,120 - ERROR - Unsupported ops: Counter({'FakeQuantWithMinMaxVars': 28})

And, they officially not support quantized tensorflow models:

We have not been planning for adding direct support for quantitizatized tensorflow models for a couple of reasons...

If you want to convert a tensorflow model to onnx and than quantitize the onnx model you'd

Regarding the quantization, we have a chance here.

zhenhuaw-me commented 4 years ago

TF2 support

Limit support here as of July 13 2020. Not going to play around as we don't have much tf2 experience neither...

There is now experimental support for tf-2.x. With the exception of LSTM unit tests, all unit tests are enabled and passing. Unit tests that we still need to fix are marked with @skip_tf2. GRU/LSTM's are converting but not runnable due to type/shape inference issues at runtime (working on that one). All unit tests are running in eager mode. After execution we take the python function, make it a graph and convert it to ONNX. When running under tf-2.x tf2onnx will use the tensorflow V2 controlflow.

zhenhuaw-me commented 4 years ago

We known the basic status here. Layout handling is better then we imaging and quantization is too limited.

zhenhuaw-me / tflite2onnx