onnx / models

A collection of pre-trained, state-of-the-art models in the ONNX format
http://onnx.ai/models/
Apache License 2.0
8k stars 1.41k forks source link

Certain models may be missing a reshape op #444

Open turneram opened 3 years ago

turneram commented 3 years ago

Bug Report

Which model does this pertain to?

-models/vision/classification/rcnn_ilsvrc13/model/rcnn-ilsvrc13-3.onnx -models/vision/classification/alexnet/model/bvlcalexnet-3.onnx -models/vision/classification/resnet/model/resnet50-caffe2-v1-3.onnx -models/vision/classification/zfnet-512/model/zfnet512-3.onnx -models/vision/classification/inception_and_googlenet/inception_v1/model/inception-v1-3.onnx -models/vision/classification/inception_and_googlenet/inception_v2/model/inception-v2-3.onnx -models/vision/classification/vgg/model/vgg19-caffe2-3.onnx -models/vision/classification/caffenet/model/caffenet-6.onnx -models/vision/classification/shufflenet/model/shufflenet-3.onnx

Describe the bug

We observed an error that seems to be common between all of the above models when running them using MIGraphX. It appears that they may be missing a reshape operator between a MaxPooling layer and a Gemm layer, which causes an error when the Gemm op has inputs A and B with different dimensionalities.

E.g. caffenet-6.onnx vs caffenet-3.onnx are identical except that caffenet-3 has a reshape between the MaxPooling and the Gemm, which makes the shapes of the Gemm's inputs A:{1, 9216} and B:{4096, 9216}(transB=True) - this model runs without error. Whereas caffenet-6 is missing this reshape, which makes the shapes of the Gemm's inputs A:{1, 256, 6, 6} and B:{4096, 9016}(transB=True), which causes a mismatch error.

Reproduction instructions

System Information

Linux Ubuntu 18.04 ONNX version 1.8 Run directly from MIGraphX Driver

(Build and install MIGraphX within our Docker container):

/code/AMDMIGraphX/AMDMIGraphX/build/bin/driver read /path/to/caffenet-6.onnx

The difference between model versions can also be observed by viewing the graphs in Netron.

Running the models in ONNX Runtime on my system encountered other issues before reaching the operators that are producing this error.

wenbingl commented 3 years ago

Thanks for the reporting, @turneram The model earlier than opset 7 were lack of maintenance, and I suppose these would be removed in the future.