mlcommons / inference

Reference implementations of MLPerf™ inference benchmarks
https://mlcommons.org/en/groups/inference
Apache License 2.0
1.14k stars 500 forks source link

Disconnected Loop operators in ONNX version of SSD models? #1125

Open gyenesvi opened 2 years ago

gyenesvi commented 2 years ago

I have been investigating the ONNX version of SSD models (SSD Resnet, SSD Mobilenet) by opening them in Netron viewer. The ONNX models contain a few Loop operators; more specifically, the Non-Maximum Suppression algorithm on the bounding boxes at the end of these models is implemented with a Loop. However, each instance of those Loop operators seem to have no real inputs, only constants (initial value of iteration counter), or tensors that become constants if the input shape is fixed to a concrete value (initial value of loop condition). I have also tried running optimization and constant folding on the ONNX models (using this tool: https://github.com/daquexian/onnx-simplifier), and in accordance with what I see in Netron viewer, these Loop operators get detached from the graph because they only have constant inputs. As these tools are developed by independent authors, it is suspected that it's not just a bug in these tools, but the models themselves may be wrong. I have actually investigated the binary protobuf files themselves, and their contents is as shown by Netron. So it is either something unclear about how such Loop operators should be interpreted (the ONNX documentation for the Loop operator suggests they should have real, non-constant data input tensors), or the models are wrong.

I am attaching some screenshots from Netron to illustrate the problem. In case of SSD Resnet34:

Screenshot 2022-03-31 at 10 12 54

As you can see, after the Transpose operator, only the shape of the tensor is used to derive the loop condition of the loop, the actual tensor data is not processed by the loop (which contains the subgraph for the NMS operation).

If I optimize the graph and do constant folding, it gets detached after the Transpose operation:

Screenshot 2022-03-31 at 10 13 36

The situation is similar at the beginning of SSD Mobilenet model:

Screenshot 2022-03-31 at 10 14 21

Only the shape of the input tensor is used, not the data itself. After constant folding optimization, it gets detached:

Screenshot 2022-03-31 at 10 14 59
gyenesvi commented 2 years ago

@nvpohanh this is the issue I mentioned to you.

nvpohanh commented 2 years ago

@psyhtest @pgmpablo157321 Are you aware of any reports on the SSDs ONNX files being broken as @gyenesvi mentioned?

@rnaidu02 for vis.

guschmue commented 2 years ago

This loop is for the batch dim. NMS gets called in the subgraph which netron does not show in the main view. To see the the subgraph, click on a model input and look for 'subgraph' in model properties. This model was converted with tf2onnx and tf2onnx already does constant folding so in theory onnx-simplyfier should not help.

gyenesvi commented 2 years ago

@guschmue yes I know it's for the batch dim, and I did check out the subgraph, that's okay. However, the model has dynamic input shapes, and if you fix those to the actual image size and some batch size (given as arguments to onnx-simplifier), then more constant folding becomes possible because there are many calculations on shapes (the batch size), which become constant. If you notice the only edge going into these Loop ops depend on the batch dimension of the shape, and once that is fixed, there are only constant inputs to the loop; an initial condition (which is always true given the batch size is non-zero), and an iteration counter. So the loops seem to be disconnected from the actual data flowing through the graph.

guschmue commented 2 years ago

Not a trivial thing to optimize that loop correctly away when you switch to static shapes. You could to reconvert the model with tf2onnx - there is an option to override the shapes in the model, something like --inputs image_tensor:0[1,224,224,3] can if I remember correctly we try to force tensorflow to re-evaluate the graph so we'd never see that loop. tflite models for this might already have batchsize 1 burned in and you could try to convert that one with something like --tflite model.tflite.

gyenesvi commented 2 years ago

Indeed, it would be nice if that loop would disappear when batch size is 1, but I agree that it's probably hard to optimize it away completely. I could give reconversion a try, however, first I'd like to clearly understand if the current ONNX model is incorrect. Is that your conclusion as well?

I also checked the opset 8 version of ssd-mobilenet (the above examples came from the opset 11 version), and it's even worse, the model in this repo is already disconnected, for example the input is not used by the model.

On the other hand, the opset 8 version of ssd-resnet34 is somewhat better. It seems to have the loop optimized away and the subgraph that used to be inside the loop seems to be folded into the main graph (at the same time, the part corresponding to that subgraph seems to be much smaller as in the opset 11 version, which is suspicious). However, with the subgraph folded in, it's somewhat more difficult to determine which operators correspond to the overall NMS algorithm that originally resided inside the loop (which would be handy for executing it as one custom op).

I could not find any tflite model for this, do you know of any?

guschmue commented 2 years ago

accuracy numbers are ok, so I assume the model is correct. opset-8 is very old and evil because pre opset-9 initializers are inputs (aka not const) which prevents bunch of optimizations. Might want to re-convert the model with something like:

python -m tf2onnx.convert --input  ssd-mobilenet/ssd_mobilenet_v1_coco_2018_01_28.pb --output  /tmp/t.onnx --inputs image_tensor:0 --outputs num_detections:0,detection_boxes:0,detection_scores:0,detection_classes:0 --opset 13

You can try to get rid of the loop with :

 --inputs image_tensor:0[1,224,224,3]

tf2onnx would try to convince tensorflow to re-compute the graph but just tried and I think that isn't helping.

Just checked the original tensorflow tgz and there is no tflite model and ssd-resnet34 was special to mlperf so there isn't a tflite model for sure. Converting it via toco is not good enough since there are normally a few code changes in the model for tflite to make batchsize 1 happen.

That said - this loop should not be costly, don't think it will be noticeable in the total time.

gyenesvi commented 2 years ago

The problem is not the time it takes to run the loop, but that our inference engine does not handle loops. The idea was to edit the graph and replace the whole loop with a single custom operator that our inference engine could execute. That's when we realized that the graph seems to be disconnected, so even rewriting is not possible.

Thanks for the reference for the converter tool, I have tried reconverting, including providing input shape and folding constants (although it does not seem to fold everything that would be possible). Anyway, the result is similar, the loops seem to be disconnected.

When you say accuracy numbers are okay, is that based on recently re-running tests with ONNX runtime, or is it based on some old submission which itself may have been executed on an edited graph (as I was informed most submitters do that). I just can't wrap my head around how ONNX runtime could execute a disconnected graph correctly, so that's what I want to understand.

gyenesvi commented 2 years ago

@guschmue any feedback on this?

gyenesvi commented 1 year ago

I believe I have figured out what the problem is with this model. The model itself may apparently be valid, but it has a special case for the Loop operator that some tools do not properly handle. The problem is that the loop operator can have some explicit and some implicit inputs. In this special case, where the loop purely parallel is over the batch dimension (which becomes constant and there are no loop carried dependencies), the loop operator only has implicit inputs only seen by its body using tensors from the outer context, but those are not explicitly listed as inputs to the operator itself, hence some tools mistakenly see that the graph is disconnected.

By the way, I can see that these SSD models are being removed from MLPerf v2.1, what is the reason behind that decision? Is that right that only one quantized TF model is being kept for v2.1?

nv-ananjappa commented 1 year ago

@gyenesvi A new model named RetinaNet was added for 2.1.