kendryte / nncase

Open deep learning compiler stack for Kendryte AI accelerators ✨
Apache License 2.0
748 stars 181 forks source link

sdk infers floating-point model, and the output results are all "inf...." #396

Closed annosoo closed 2 years ago

annosoo commented 3 years ago

Describe the bug I was trying to run a float kmodel generated by ncc tool on k210 board using sdk, but I got different results from ncc infer, such as output "inf ...", and I haven‘t found what caused it so far.

To Reproduce here is my ncc command to generate float kmodel without preprocessing: /ncc -v compile -i onnx -t k210 model/model.onnx model/model_float.kmodel

Expected behavior sdk infers float kmodel without preprocessing, and input is preprocessed float32 bin file, then output result: 图片

Origin model and code here is a simple test model: model.zip

here is my preprocessed float32 bin file: f32.zip

Environment (please complete the following information):

zhangyang2057 commented 3 years ago

First of all, I cannot find the v1.0.0.b6afb77 version to reproduce. So I use the latest nncase v1.0.0(1.0.0-ec7f0405) to test.

  1. Use ncc tool to build model.onnx, then use ncc to infer using f32.bin as input, then get the sdk infer result, which is the same as the onnx runtime result. # (1, 10) 0.997021 0.000000 0.000050 0.000016 0.000000 0.002851 0.000016 0.000003 0.000031 0.000011
  2. Write an app with nncase runtime, test the app and dump the infer result on k510(the AI hardware is independent, as all of ops will be running on riscv64 cpu). $ ./nncase_test case ./nncase_test build Nov 1 2021 15:28:26 ============> interp.load_model finished! ============> interp.input_tensor finished! interp.run() duration: 45.4815 ms 0.997021 1.74755e-09 5.04092e-05 1.63082e-05 1.15026e-07 0.00285069 1.57548e-05 3.38097e-06 3.14947e-05 1.06285e-05 output 0 cosine similarity: 1

The result based on riscv64 cpu is the same as both ncc infer and onnx runtime results.

annosoo commented 2 years ago

I try to reproduce the issue with the lastest nncase. However, another error#432 happens.

zhangyang2057 commented 2 years ago

With 1.0.0-72ab6de7 you mentioned at #432, I cannot reproduce your issue.

nncase Command Line Tools 1.0.0-72ab6de7
Copyright 2019-2021 Canaan Inc.
1. Import graph...
2. Optimize target independent...
3. Optimize target dependent...
5. Optimize target dependent after quantization...
6. Optimize modules...
7.1. Merge module regions...
7.2. Optimize buffer fusion...
7.3. Optimize target dependent after buffer fusion...
8. Generate code...
WARN: Cannot find a decompiler for section .rdata
WARN: Cannot find a decompiler for section .text

SUMMARY
INPUTS
0   input   f32[1,1,28,28]
OUTPUTS
0   output  f32[1,10]

MEMORY USAGES
.input     3.06 KB  (3136 B)
.output   40.00 B   (40 B)
.data    120.00 B   (120 B)
MODEL     31.66 KB  (32416 B)
TOTAL     34.88 KB  (35712 B)

std::bad_alloc means memory allocation failed. Could you please check the memory size on your PC?

$ cat /proc/meminfo 
MemTotal:       65811848 kB
MemFree:        46625960 kB
MemAvailable:   58787080 kB
Buffers:         2247348 kB
Cached:          9726356 kB
SwapCached:            0 kB
annosoo commented 2 years ago

It's a bit strange that the issue has disappeared using the lastest ncc 1.0.0-72ab6de7.

nncase Command Line Tools 1.0.0-72ab6de
Copyright 2019-2021 Canaan Inc.
1. Import graph...
2. Optimize target independent...
3. Optimize target dependent...
5. Optimize target dependent after quantization...
6. Optimize modules...
7.1. Merge module regions...
7.2. Optimize buffer fusion...
7.3. Optimize target dependent after buffer fusion...
8. Generate code...
MemTotal:        3995040 kB
MemFree:          514116 kB
MemAvailable:    2249176 kB
Buffers:           96220 kB
Cached:          1800188 kB
SwapCached:        18496 kB

But it still happens in some old ncc version, such 1.0.0-ae73cec in master branch, and I have checked the memory size, which seems to be enough.

nncase Command Line Tools 1.0.0-ae73cec
Copyright 2019-2021 Canaan Inc.
Fatal: std::bad_alloc
MemTotal:        3995040 kB
MemFree:          516772 kB
MemAvailable:    2251400 kB
Buffers:           96012 kB
Cached:          1799624 kB
SwapCached:        15900 kB

I uploaded the above ncc 1.0.0.b6afb77, where the same issue happened, for reference. b6afb77.zip

nncase Command Line Tools 1.0.0-b6afb77
Copyright 2019-2021 Canaan Inc.
Fatal: std::bad_alloc
MemTotal:        3995040 kB
MemFree:          512328 kB
MemAvailable:    2247084 kB
Buffers:           96124 kB
Cached:          1799688 kB
SwapCached:        18496 kB

I'm not sure whether it will happen again.

zhangyang2057 commented 2 years ago

I can reproduce std::bad_alloc issue according to your uploaded ncc binary. But I cannot find the commit id(b6afb77) on nncase master branch. Have you modify the nncase code and build by yourself? Please double check your changes and try to test again without your change.

annosoo commented 2 years ago

I checked and found that the ncc 1.0.0.b6afb77 is not in master branch, the version which can be downloaded from here, and I haven't modified it. In addition, can you reproduce the issue with ncc 1.0.0-ae73cec in master branch? Maybe there is the same issue within the other ncc versions, and I'm not sure whether it's a bug.

zhangyang2057 commented 2 years ago

I use the following ncc command and build model.onnx successfully with ncc 1.0.0.b6afb77 you uploaded.

$ ./ncc compile -i onnx -t k210 model.onnx test.kmodel --input-type default --dump-ir --dump-asm --dump-dir output
nncase Command Line Tools 1.0.0-b6afb77
Copyright 2019-2021 Canaan Inc.
1. Import graph...
2. Optimize target independent...
3. Optimize target dependent...
5. Optimize target dependent after quantization...
6. Optimize modules...
7.1. Merge module regions...
7.2. Optimize buffer fusion...
7.3. Optimize target dependent after buffer fusion...
8. Generate code...
WARN: Cannot find a decompiler for section .rdata
WARN: Cannot find a decompiler for section .text

SUMMARY
INPUTS
0   input   f32[1,1,28,28]
OUTPUTS
0   output  f32[1,10]

MEMORY USAGES
.input     3.06 KB  (3136 B)
.output   40.00 B   (40 B)
.data    120.00 B   (120 B)
MODEL     31.65 KB  (32408 B)
TOTAL     34.87 KB  (35704 B)

I reproduced your issue yesterday because the model path was wrong.

$ ./ncc compile -i onnx -t k210 / test.kmodel --input-type default --dump-ir --dump-asm --dump-dir output
nncase Command Line Tools 1.0.0-b6afb77
Copyright 2019-2021 Canaan Inc.
Fatal: std::bad_alloc
zhangyang2057 commented 2 years ago

I reset nncase repo back to ae73cecf and build nncase locally. ncc can build the model.onnx successfully.

$ /home/zhangyang/workspace/nncase_x86_64/out/bin/ncc compile -i onnx -t k210 model.onnx test.kmodel --input-type default --dump-ir --dump-asm --dump-dir output
nncase Command Line Tools 1.0.0-ae73cecf
Copyright 2019-2021 Canaan Inc.
1. Import graph...
2. Optimize target independent...
3. Optimize target dependent...
5. Optimize target dependent after quantization...
6. Optimize modules...
7.1. Merge module regions...
7.2. Optimize buffer fusion...
7.3. Optimize target dependent after buffer fusion...
8. Generate code...
WARN: Cannot find a decompiler for section .rdata
WARN: Cannot find a decompiler for section .text

SUMMARY
INPUTS
0   input   f32[1,1,28,28]
OUTPUTS
0   output  f32[1,10]

MEMORY USAGES
.input     3.06 KB  (3136 B)
.output   40.00 B   (40 B)
.data    120.00 B   (120 B)
MODEL     31.65 KB  (32408 B)
TOTAL     34.87 KB  (35704 B)
annosoo commented 2 years ago

I try to use the same ncc command to build model, but nothing changed.

$ ./ncc compile -i onnx -t k210 model/model.onnx model/model_float.kmodel --input-type default --dump-ir --dump-asm --dump-dir dump
nncase Command Line Tools 1.0.0-b6afb77
Copyright 2019-2021 Canaan Inc.
Fatal: std::bad_alloc

Can you provide me with some information about how much memory it needs to convert a model using the above ncc? I find that your free memory is much more than mine according to previous reply. Even though the model can be successfully converted using the lastest ncc now. Here is my free memory.

$ cat /proc/meminfo
MemTotal:        3995040 kB
MemFree:          266084 kB
MemAvailable:    2089752 kB
Buffers:           75848 kB
Cached:          1860384 kB
SwapCached:        25452 kB
zhangyang2057 commented 2 years ago

The model.onnx is small, so ncc will not take much memory.

Now, there are two ways to compile models, one is nncase python APIs , the other is ncc client.
nncase python APIs depend on nncase wheel package, which you can get from nncase release. You can refer to usage to get more information about how to install nncase wheel package and make use of python APIs to compile your model.

For ncc, you'd better build it by yourself as it is os dependent. Both nncase github and my PC are ubuntu 18.04, but your os is ubuntu 20.04? Could you please build nncase by yourself locally and try again?

In addition, nncase has a docker image based on ubuntu 20.04, which can install nncase wheel package and build nncase.

zhangyang2057 commented 2 years ago

ncc with 1.0.0-b6afb77 can compile the model successfully with nncase docker image.

root@561a2382a14e:/mnt/kendryte/nncase/issues/396/github# ls -l
total 9692
drwxrwxr-x  2 1000 1000    4096 Oct  8 04:18 bin
drwxr-xr-x 13 root root    4096 Nov 10 02:19 dump
drwxrwxr-x  3 1000 1000    4096 Oct  8 04:18 include
drwxrwxr-x  3 1000 1000    4096 Oct  8 04:18 lib
drwxr-xr-x  2 root root    4096 Nov 10 02:19 model
-rw-rw-r--  1 1000 1000 9896370 Nov  9 01:17 nncase-ubuntu-18.04-x86_64.zip
drwxrwxr-x  3 1000 1000    4096 Oct  8 04:18 python
root@561a2382a14e:/mnt/kendryte/nncase/issues/396/github# ./bin/ncc compile -i onnx -t k210 model.onnx test.kmodel --input-type default --dump-ir --dump-asm --dump-dir output
nncase Command Line Tools 1.0.0-b6afb77
Copyright 2019-2021 Canaan Inc.
Fatal: Cannot open file: model.onnx
root@561a2382a14e:/mnt/kendryte/nncase/issues/396/github# ./bin/ncc compile -i onnx -t k210 model/model.onnx test.kmodel --input-type default --dump-ir --dump-asm --dump-dir output
nncase Command Line Tools 1.0.0-b6afb77
Copyright 2019-2021 Canaan Inc.
1. Import graph...
2. Optimize target independent...
3. Optimize target dependent...
5. Optimize target dependent after quantization...
6. Optimize modules...
7.1. Merge module regions...
7.2. Optimize buffer fusion...
7.3. Optimize target dependent after buffer fusion...
8. Generate code...
WARN: Cannot find a decompiler for section .rdata
WARN: Cannot find a decompiler for section .text

SUMMARY
INPUTS
0   input   f32[1,1,28,28]
OUTPUTS
0   output  f32[1,10]

MEMORY USAGES
.input     3.06 KB  (3136 B)
.output   40.00 B   (40 B)
.data    120.00 B   (120 B)
MODEL     31.65 KB  (32408 B)
TOTAL     34.87 KB  (35704 B)