kraiskil / onnx2c

Open Neural Network Exchange to C compiler.
Other
184 stars 30 forks source link

large output file at ~200Mb #27

Open threadprogrammer opened 1 year ago

threadprogrammer commented 1 year ago

Hello. I'm getting an output model.c file with 230mb in size. I set the input ONNX file from yolov7-tiny. the issue is when i want to load model.c in my IDE as it throws segmentation error in compiling.

What is the general way to load that kind of huge arrays in C language for compiling?

Another question is how to reduce precision of my output file to something like 4(2.236492556895739 to => 2.3465)?

kraiskil commented 1 year ago

I've successfully compiled yolov6 (I think v6, not 100% sure).

I don't recall seeing segfault from GCC even with big files, but the compiler process did run out of memory for the biggest files. Maybe there was a segfault after malloc failed though..?

If you are running out of memory, things to check are:

A side note - the generated inference binary is going to be excruciatingly slow to execute. I'm not sure what optimizations should be added to onnx2c to speed up the generated code.

Another question is how to reduce precision of my output file to something like 4(2.236492556895739 to => 2.3465)?

For now the number of digits is hard coded. Should be a command line option. I added a separate issue #28 for this.

threadprogrammer commented 1 year ago

thanks but I'm simulating in vitis-hls(using gcc) for simulating inference on zynq. but the simulation process is based on C. i don't know why segmentation error happens?

threadprogrammer commented 1 year ago

Generally what is the correct way of loading arrays of shape(at this scale): [64][3][3][3] which is just a layer's weights?

kraiskil commented 1 year ago

I do not understand the problems here.

If GCC segfaults on any input, then that is a bug in GCC. Most likely it happens here because it is out of memory on your host system (so more of a feature than a bug). But if that is not the case, you'll have to run GCC itself in the debugger.

And please explain what you mean by 'loading an array'. Also, less than 2000 floats is not a particularly big array.

threadprogrammer commented 1 year ago

Now I distribute arrays over 20 header files. Each with the size of about 3Mb. When I include them in my main file, there are segfaults to allocate them. Some of the layers has arrays of [512][1024][1][1] size!. Is there a maximum array size in C(I know GCC must declare that).?

threadprogrammer commented 1 year ago

Compilation is done. But when run the file, segfault happens. what should I do?

threadprogrammer commented 1 year ago

this is the test to be run on the network: `

include "layer_1_weights.h"

include "layer_2_weights.h"

include "layer_3_weights.h"

include "layer_4_weights.h"

include "layer_5_weights.h"

include "layer_6_weights.h"

include "layer_7_weights.h"

include "layer_8_weights.h"

include "layer_9_weights.h"

include "layer_10_weights.h"

include "layer_11_weights.h"

include "layer_12_weights.h"

include "layer_13_weights.h"

include "layer_14_weights.h"

include "layer_15_weights.h"

include "layer_16_weights.h"

include "layer_17_weights.h"

include "layer_18_weights.h"

include "layer_19_weights.h"

include "layer_20_weights.h"

include "layer_21_weights.h"

include "layer_22_weights.h"

include "layer_23_weights.h"

include "layer_24_weights.h"

include "layer_25_weights.h"

include "layer_26_weights.h"

include "layer_functions.h"

include

define MAX(X,Y) ( X > Y ? X : Y)

define MIN(X,Y) ( X < Y ? X : Y)

define CLIP(X,L) ( MAX(MIN(X,L), -L) )

// The MAINNN int main(){

float tensor_images[1][3][640][640] = {0};
float tensor_output[1][3][80][80][85] = {0};
float tensor_286[1][3][40][40][85] = {0};
float tensor_298[1][3][20][20][85] = {0};

YOLOV7TINY(tensor_images, tensor_output, tensor_286, tensor_298);
printf("hellooooo\n");
return 0;

}`

kraiskil commented 1 year ago

Interesting. What change made GCC stop segfaulting?

Now I would recommend compiling the above binary with debugging enabled (gcc -Og -g ...) and run the binary under the debugger untill you hit the segfault. This should tell if it really is a bug in the code generated by onnx2c, or if there still is a system memory problem. Or maybe just put a printf("hello"); fflush(stdout); before calling the inference function :) (if it prints, its a bug. If not, a out of memory problem)

Also, I have no insight into what you are trying to do. You mention Zynq. Sounds like you might be targeting its FPGA partition and not the ARM core? Are you running the inference on target (i.e. the Zynq ARM) or on a big development PC? And if on a PC, are you compiling natively for it, or within some sort of FPGA-HDL framerwork?

threadprogrammer commented 1 year ago

In vitis the simulation process takes place in the ARM processor. but there is one more step named cosimulation; that uses fpga resourses like FF(flip flops) to simulate the network.

but my problem now is that i'm stuck in simulation(no FPGA entered scenario).

my guess is its due to the gcc that vitis work on. because its not a normal behaviour.

anyway thanks for your recommendations.

kraiskil commented 1 year ago

If the inference runs on the ARM, then running out of physical memory is also a likely explanation. Unless Zynq boards these days have bigger memories than when I last used them :)

Adding swap could be an option, but probably it would be extremely slow to run.

Also compiling the generated code on the PC, you could see how much memory it takes (using e.g. top), and see if it can fit at all into the Zynq RAM.