microsoft / nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
MIT License
955 stars 159 forks source link

[BUG] Failed to compile yolov5 onnx format model #532

Open fomiuna opened 3 months ago

fomiuna commented 3 months ago

🐛 Bug

I have downloaded a yolov5 onnx model yolov5s.onnx which can be inferenced by onnxruntime, but I got a failure when compiling it with nnfusion. The compiling environment is as follows. Component Version
model https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.onnx
nnfusion git clone https://github.com/microsoft/nnfusion.git (main branch)
cuda 10.0.130
onnx/onnxruntime-gpu 1.14.1
ubuntu 18.04

To Reproduce Run Rammer with library kernels in the compiling environment:

nnfusion yolov5s.onnx -f onnx -fkernel_fusion_level=3 -fblockfusion_level=1 -fconst_folding_backend=CUDA -fwarmup_step=5 -frun_step=5 

and the output is:

[WARNING] 2024-07-01T11:00:08z src/contrib/custom_op/custom_op.h 27 $NNFUSION_HOME was not set, use /root/.nnfusion.
[WARNING] 2024-07-01T11:00:08z src/contrib/custom_op/custom_op.h 27 $NNFUSION_HOME was not set, use /root/.nnfusion.

============================================================================
---- Processing '/root/model/yolo/yolov5-releases-v7.0/yolov5s.onnx'
============================================================================
[INFO] 2024-07-01T11:00:08z src/nnfusion/frontend/onnx_import/onnx.cpp 54   Optimizing ONNX Graph with External Tool (models/pytorch2onnx/ort_run_frozen.py)
[WARNING] 2024-07-01T11:00:08z src/nnfusion/common/util.cpp 47  $NNFUSION_HOME was not set, use /root/.nnfusion.
2024-07-01 11:00:08.796084517 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:541 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.
ONNX model check passed!
Importing ONNX model into ONNX Runtime...
Execution Providers: ['CPUExecutionProvider']
output0
[3.467e+00 3.309e+00 7.027e+00 8.203e+00 2.408e-05 2.323e-01 3.120e-03
 3.339e-02 2.266e-03 1.929e-02] ...(size= 2142000 end with 0.001398 )
[INFO] 2024-07-01T11:00:09z src/nnfusion/frontend/onnx_import/onnx.cpp 40   Import ONNX Graph Size: [14731163]
... ...
(many INFO log lines from src/nnfusion/frontend/onnx_import/util/graph_convert.cpp)
... ...
[INFO] 2024-07-01T11:00:10z src/nnfusion/frontend/onnx_import/util/graph_convert.cpp 519    convert node: /model.24/Sigmoid
[INFO] 2024-07-01T11:00:10z src/nnfusion/frontend/onnx_import/util/graph_convert.cpp 538    node /model.24/Sigmoid, output InsertedCast_/model.24/Sigmoid_output_0, shape Shape{1, 3, 80, 80, 85}
[INFO] 2024-07-01T11:00:10z src/nnfusion/frontend/onnx_import/util/graph_convert.cpp 519    convert node: /model.24/Split
[ERROR] 2024-07-01T11:00:10z src/nnfusion/util/errors.hpp 169   Check failed: 'it != std::end(m_attributes)' at /root/nnfusion-main/src/nnfusion/frontend/onnx_import/core/node.cpp:106:
Node (/model.24/Split): unknown attribute 'split'
[ERROR] 2024-07-01T11:00:10z src/nnfusion/util/errors.hpp 169   Check failed: 'axis_length % num_splits == 0' at /root/nnfusion-main/src/nnfusion/frontend/onnx_import/op/split.cpp:61:
(no explanation given)
terminate called after throwing an instance of 'nnfusion::errors::CheckError'
  what():  Check failed: 'axis_length % num_splits == 0' at /root/nnfusion-main/src/nnfusion/frontend/onnx_import/op/split.cpp:61:
(no explanation given)
Aborted (core dumped)

It seems that the operator "split" is not handled correctly during graph converting, is there any suggestion?

Additional context Besides, I tried to convert yolov5s.pt to yolov5s-export.onnx in following converting environment: Component Version
yolov5 git clone https://github.com/ultralytics/yolov5.git (master branch)
model https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.pt
onnx 1.16.1
python 3.10.9
torch 1.12.1

Convert as follows:

$ cd yolov5/ && cp /path/to/yolov5s.pt .
$ python export.py --weights yolov5s.pt --include torchscript onnx --opset 11
$ mv yolov5s.onnx yolov5s-export.onnx

Again, compiled the result yolov5s-export.onnx using Rammer with library kernel in compiling environment as mentioned before, I got a different error message as follows:

(similar info as that of yolov5s.onnx compiling result)
... ...
(many INFO log lines from src/nnfusion/frontend/onnx_import/util/graph_convert.cpp)
... ...
(many INFO log lines from src/nnfusion/engine/pass/graph/runtime_const_folding_pass.cpp)
... ...
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/batchnorm_inference_folding_pass.cpp 901 batchnorm inference folding Pass ends for Graph: Graph_1
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/cache/manager.cpp 52    Open kernel cache from: /root/.cache/nnfusion/kernel_cache.db
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/cache/manager.cpp 52    Open kernel cache from: /root/.cache/nnfusion/kernel_cache.db
[ERROR] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/kernel_selection.cpp 270    No valid kernel found:images(op type: Parameter, dev type: CUDA_GPU)
[ERROR] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/kernel_selection.cpp 270    No valid kernel found:onnx::Concat_246(op type: Resize, dev type: CUDA_GPU)
[ERROR] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/kernel_selection.cpp 270    No valid kernel found:onnx::Concat_271(op type: Resize, dev type: CUDA_GPU)
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/cache/manager.cpp 52    Open kernel cache from: /root/.cache/nnfusion/kernel_cache.db
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/cache/manager.cpp 52    Open kernel cache from: /root/.cache/nnfusion/kernel_cache.db
[WARNING] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 153 Kernel should be emitted before this pass:images
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_140 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_145 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_155 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_150 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_160 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_165 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_172 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_177 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_187 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_182 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_192 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_197 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_203 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_208 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_215 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_220 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_225 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_230 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_235 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_240 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_246 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_251 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_257 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_262 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_269 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_274 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_279 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_284 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_289 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_294 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_301 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_306 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator onnx::MaxPool_232 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator onnx::MaxPool_233 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator onnx::Concat_234 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_315 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_320 is not BlockCudaEmitter, skip in BlockFusion
[WARNING] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 153 Kernel should be emitted before this pass:onnx::Concat_246
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_327 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_332 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_337 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_342 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_348 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_353 is not BlockCudaEmitter, skip in BlockFusion
[WARNING] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 153 Kernel should be emitted before this pass:onnx::Concat_271
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_360 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_365 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_370 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_375 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_381 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_386 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_500 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator onnx::Transpose_341 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_392 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_397 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator onnx::Sigmoid_342 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_402 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_407 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator onnx::Concat_367 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_413 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_475 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_418 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator onnx::Transpose_380 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator onnx::Sigmoid_381 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_424 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_429 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_434 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_439 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator onnx::Concat_406 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_445 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Convolution_450 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator onnx::Transpose_419 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator onnx::Sigmoid_420 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator onnx::Concat_445 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:41z src/nnfusion/engine/pass/graph/blockfusion/blockfusion_optimizer.cpp 184    Operator Result_526 is not BlockCudaEmitter, skip in BlockFusion
[INFO] 2024-07-01T11:02:42z src/nnfusion/engine/pass/graph/assign_async_info_pass.cpp 211   assign thread info-------------------------------
[ERROR] 2024-07-01T11:02:42z src/nnfusion/util/errors.hpp 169   Check failed: '(*gnode)["Kernel_Selection_Result"].is_valid()' at /root/nnfusion-main/src/nnfusion/engine/pass/graph/assign_async_info_pass.cpp:924:
Kernel should be selected before this pass:Resize
terminate called after throwing an instance of 'nnfusion::errors::CheckError'
  what():  Check failed: '(*gnode)["Kernel_Selection_Result"].is_valid()' at /root/nnfusion-main/src/nnfusion/engine/pass/graph/assign_async_info_pass.cpp:924:
Kernel should be selected before this pass:Resize
Aborted (core dumped)

What does it mean "No valid kernel found" here and how to fix it? And why the errors from compiling yolov5s.onnx and yolov5s-export.onnx differ? Appreciate for any help!