neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.98k stars 172 forks source link

Error when compiling YOLO-NAS model #1170

Closed Y-T-G closed 1 year ago

Y-T-G commented 1 year ago

Describe the bug I am trying to compile the ONNX model for YOLO-NAS in Google Colab.

Expected behavior The model should compile.

Environment Include all relevant environment information:

  1. OS [e.g. Ubuntu 18.04]: 22.04
  2. Python version [e.g. 3.8]: 3.10.12
  3. DeepSparse version or commit hash [e.g. 0.1.0, f7245c8]: 1.5.2
  4. ML framework version(s) [e.g. torch 1.7.1]: 2.0.1
  5. Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]: ONNX - 1.12.0
  6. CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows: {'L1_data_cache_size': 32768, 'L1_instruction_cache_size': 32768, 'L2_cache_size': 262144, 'L3_cache_size': 57671680, 'architecture': 'x86_64', 'available_cores_per_socket': 1, 'available_num_cores': 1, 'available_num_hw_threads': 2, 'available_num_numa': 1, 'available_num_sockets': 1, 'available_sockets': 1, 'available_threads_per_core': 2, 'bf16': False, 'cores_per_socket': 1, 'dotprod': False, 'i8mm': False, 'isa': 'avx2', 'num_cores': 1, 'num_hw_threads': 2, 'num_numa': 1, 'num_sockets': 1, 'threads_per_core': 2, 'vbmi': False, 'vbmi2': False, 'vendor': 'GenuineIntel', 'vendor_id': 'Intel', 'vendor_model': 'Intel(R) Xeon(R) CPU @ 2.20GHz', 'vnni': False, 'zen1': False}

To Reproduce Exact steps to reproduce the behavior:

  1. Convert YOLO-NAS to ONNX using official documentation.
  2. Try to compile model:
    
    from deepsparse import compile_model
    from deepsparse.utils import generate_random_inputs
    onnx_filepath = "yolo_nas_s.onnx"
    batch_size = 1

Generate random sample input

inputs = generate_random_inputs(onnx_filepath, batch_size)

Compile and run

engine = compile_model(onnx_filepath, batch_size)


**Errors**

{"pid":7,"type":"jupyter","level":40,"msg":"DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.5.2 COMMUNITY | (93c38382) (release) (optimized) (system=avx2, binary=avx2)","time":"2023-08-07T02:35:50.156Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.5.2 (93c38382) (release) (optimized) (system=avx2, binary=avx2)","time":"2023-08-07T02:35:51.124Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"Date: ","time":"2023-08-07T02:35:51.124Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"08-07-2023 @ 02:35:51 UTC","time":"2023-08-07T02:35:51.124Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"OS: ","time":"2023-08-07T02:35:51.125Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"Linux 4960402abf9e 5.15.109+ #1 SMP Fri Jun 9 10:57:30 UTC 2023","time":"2023-08-07T02:35:51.125Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"Arch: x86_64","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"CPU: GenuineIntel","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"Vendor: Intel","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"Cores/sockets/threads: [1, 1, 2]","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"Available cores/sockets/threads: [1, 1, 2]","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"L1 cache size data/instruction: 32k/32k","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"L2 cache size: 0.25Mb","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"L3 cache size: 55Mb","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"Total memory: 12.6784G","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"Free memory: 8.24567G","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"Assertion at src/lib/engine/execution/layouts/greedy_assign_layouts.cpp:495","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"Backtrace:","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":" 0# wand::detail::abort_prefix(std::ostream&, char const, char const, int, bool, bool, unsigned long) in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":" 1# wand::detail::assert_fail(char const, char const, int) in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":" 2# 0x00007DA04967948A in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":" 3# 0x00007DA04967A6C9 in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":" 4# 0x00007DA04967A85C in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":" 5# 0x00007DA0490BCF86 in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":" 6# 0x00007DA0490C18DC in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":" 7# 0x00007DA0490C709E in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":" 8# 0x00007DA04901E46A in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":" 9# 0x00007DA04900AD73 in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"10# 0x00007DA048DBC400 in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"11# wand::engine::compiler::compiler::plan_execution_graph(boost::adjacency_list<boost::multisetS, boost::listS, boost::bidirectionalS, wand::engine::execution::data_descriptor, wand::engine::execution::graph_edge, boost::no_property, boost::listS> const&) const in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"12# wand::engine::compiler::compiler::compile(boost::adjacency_list<boost::multisetS, boost::listS, boost::bidirectionalS, wand::engine::execution::data_descriptor, wand::engine::execution::graph_edge, boost::no_property, boost::listS> const&) const in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"13# wand::engine::compiler::compiler::compile(wand::engine::compute::compute_graph const&) const in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"14# wand::engine::compiler::compiler::compile(wand::engine::intake::graph const&) const in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"15# 0x00007DA048278854 in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"16# 0x00007DA04826C6FF in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"17# 0x00007DA04825740C in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"18# 0x00007DA048258561 in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"19# 0x00007DA048916E40 in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"20# 0x00007DA04891C27E in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"21# 0x00007DA04891EDD7 in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"22# 0x00007DA04891F204 in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.126Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"23# 0x00007DA04817EE9E in /usr/local/lib/python3.10/dist-packages/deepsparse/avx2/libonnxruntime.so.1.12.0","time":"2023-08-07T02:35:51.127Z","v":0} {"pid":7,"type":"jupyter","level":40,"msg":"Please email a copy of this stack trace and any additional information to: support@neuralmagic.com","time":"2023-08-07T02:35:51.127Z","v":0}

tlrmchlsmth commented 1 year ago

Hey @Y-T-G, thanks for the bug report. We can reproduce this issue and have a fix for it. I'll ping you once it's available in deepsparse-nightly

tlrmchlsmth commented 1 year ago

A fix will be available the next time a nightly release goes out, and I'll close the issue then!

Y-T-G commented 1 year ago

@tlrmchlsmth Cool. Thanks for the fix.

Y-T-G commented 1 year ago

When can I expect the nightly to be available?

jeanniefinks commented 1 year ago

Hello @Y-T-G The latest nightly has been mounted. Please now try pip install deepsparse-nightly - THANK YOU! 🥇 Jeannie / Neural Magic

mgoin commented 1 year ago

Hi @Y-T-G here is a Colab notebook showing how to export the ONNX and run it on deepsparse-nightly: https://colab.research.google.com/drive/16r8fLUgAEqPWbDlQmgrmmuq8WvFXxLnQ?usp=sharing

Y-T-G commented 1 year ago

@jeanniefinks @mgoin Thanks. I will try it out.

Y-T-G commented 1 year ago

I was able to test it on C++ and it works. Thanks.

mgoin commented 1 year ago

Thanks for sharing @Y-T-G , very cool project! Let me know if you'd be interested in sparsifying the model for more performance

Y-T-G commented 1 year ago

@mgoin Sure. That would be great. I was wondering how to improve the FPS further.

mgoin commented 1 year ago

@Y-T-G Here is a quick colab notebook I made using T4 GPU to one-shot sparsify/quantize the model. https://colab.research.google.com/drive/1DLB-tE1ide-55b9gzq6kQyrrW0lvT7xj?usp=sharing

It uses our Sparsify tool (in alpha right now, so leave feedback!) to optimize the ONNX with some (dummy) calibration data: https://github.com/neuralmagic/sparsify

Here is the low-sparsity ONNX: https://drive.google.com/file/d/1qMZCtikHtS4Edy0EBP9R7qX5i9eLyOvz/view?usp=sharing Here is the high-sparsity ONNX: https://drive.google.com/file/d/1XkVYhX4SJfM0mLRuH-RIx6F0xQ9-vAYy/view?usp=drive_link

I used dummy data so the model likely isn't accurate, but you can substitute real input data to maintain.

If you want to talk more on this, happy to jump on a call or join our slack to ask question: https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ

Base model benchmark:

Screenshot 2023-08-10 at 12 28 57 PM

Sparsified model benchmark:

Screenshot 2023-08-10 at 12 29 13 PM