jiazhihao / TASO

The Tensor Algebra SuperOptimizer for Deep Learning
Apache License 2.0
687 stars 90 forks source link

AssertionError: Internal error when reording ONNX operators #30

Closed Factos001 closed 4 years ago

Factos001 commented 4 years ago

Hi, I first download the .onnx model from this repo: https://github.com/onnx/models/tree/master/vision/classification/resnet/resnet50

Then I use the code below to load the model old_model = taso.load_onnx("./resnet50.onnx")

An error message appears that File "/home/ubuntu/taso/python/taso/init.py", line 730, in load_onnx assert len(node_list) == len(mode.graph.node), "Internal error when reording ONNX operators"

I am wondering how to solve this problem, thanks!

jiazhihao commented 4 years ago

@leolx7 Thanks for reporting the issue --- it seems the ONNX file does not list operators in a topological ordering and TASO failed to recover the order. Can you point me the ONNX file version you used (realise 1.x or master)? I want to debug this using the same version.

Factos001 commented 4 years ago

@jiazhihao Actually I have tried all these five versions and all of them got the same error.

jiazhihao commented 4 years ago

Thanks for letting me know. I will debug this issue and get back to you.

jiazhihao commented 4 years ago

@leolx7 The issue should have been fixed in commit e6fc34398

Factos001 commented 4 years ago

@jiazhihao Thanks for the feedback. I have tried. The error has gone. But it seems that the program skips a lot of operators. Like "Cannot find input tensor for operator: name(Relu_15) type (Relu) (Skipped)". As a result, the output optimized graph has much layers less than the original graph. For example, the output optimized graph for ResNet50 has only 6 layers.

jiazhihao commented 4 years ago

@leolx7 That is because TASO cannot find the input tensors from the ONNX model. I have tried https://s3.amazonaws.com/download.onnx/models/opset_6/resnet50.tar.gz and it seems to work for me. Which ONNX version are you using?

Factos001 commented 4 years ago

Hi, I am using the master one. I have tried the version you used. The optimized result is basically the same, like the image shown below. resnet

jiazhihao commented 4 years ago

@leolx7 I can reproduce your error on my machine. It is because the master version of ResNet50 uses Sum instead of Add for element-wise addition. I have added the support for Sum in TASO (commit fed32e9cf). The optimized graph looks good on my machine.

It would be great if you can rerun TASO can confirm the fix.

Factos001 commented 4 years ago

@jiazhihao Hi, I use the master version of ResNet50 and rerun TASO. It works well and the optimized graph looks good. Thanks for your patience.