xboot / libonnx

A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support.
MIT License
575 stars 107 forks source link

Hello model RAM size required #5

Closed noomio closed 3 years ago

noomio commented 3 years ago

Hi,

I'm trying to run the hello example on a small embedded system but im unsure of the memory required to allocate this model ( when runningonnx_context_alloc).

I have roughly 2MB, is that enough? Is there a smaller model that I can test with the model defined as a const char array? Like the static const unsigned char mnist_onnx[] = { ... }

jianjunjiang commented 3 years ago

Add this function to show memory information.

`static void display_mallinfo(void) { struct mallinfo mi = mallinfo();

printf("Total non-mmapped bytes (arena):       %d\n", mi.arena);
printf("of free chunks (ordblks):            %d\n", mi.ordblks);
printf("of free fastbin blocks (smblks):     %d\n", mi.smblks);
printf("of mapped regions (hblks):           %d\n", mi.hblks);
printf("Bytes in mapped regions (hblkhd):      %d\n", mi.hblkhd);
printf("Max. total allocated space (usmblks):  %d\n", mi.usmblks);
printf("Free bytes held in fastbins (fsmblks): %d\n", mi.fsmblks);
printf("Total allocated space (uordblks):      %d\n", mi.uordblks);
printf("Total free space (fordblks):           %d\n", mi.fordblks);
printf("Topmost releasable block (keepcost):   %d\n", mi.keepcost);

}`

============== Before alloc context ============== Total non-mmapped bytes (arena): 138816 of free chunks (ordblks): 1 of free fastbin blocks (smblks): 0 of mapped regions (hblks): 0 Bytes in mapped regions (hblkhd): 0 Max. total allocated space (usmblks): 0 Free bytes held in fastbins (fsmblks): 0 Total allocated space (uordblks): 3536 Total free space (fordblks): 135280 Topmost releasable block (keepcost): 135280

============== After alloc context ============== Total non-mmapped bytes (arena): 286272 of free chunks (ordblks): 1 of free fastbin blocks (smblks): 0 of mapped regions (hblks): 0 Bytes in mapped regions (hblkhd): 0 Max. total allocated space (usmblks): 0 Free bytes held in fastbins (fsmblks): 0 Total allocated space (uordblks): 232736 Total free space (fordblks): 53536 Topmost releasable block (keepcost): 53536

============== Befor onnx run ============== Total non-mmapped bytes (arena): 286272 of free chunks (ordblks): 1 of free fastbin blocks (smblks): 0 of mapped regions (hblks): 0 Bytes in mapped regions (hblkhd): 0 Max. total allocated space (usmblks): 0 Free bytes held in fastbins (fsmblks): 0 Total allocated space (uordblks): 232736 Total free space (fordblks): 53536 Topmost releasable block (keepcost): 53536

============== After onnx run ============== Total non-mmapped bytes (arena): 450112 of free chunks (ordblks): 3 of free fastbin blocks (smblks): 0 of mapped regions (hblks): 0 Bytes in mapped regions (hblkhd): 0 Max. total allocated space (usmblks): 0 Free bytes held in fastbins (fsmblks): 0 Total allocated space (uordblks): 235552 Total free space (fordblks): 214560 Topmost releasable block (keepcost): 133728

2MB memory is enough. mnist is the smallest model, you can usinig xxd -i for other models.

noomio commented 3 years ago

Thanks.

Unfortunately I'm not running on Linux.

It's a cortex-a7 with ThreadX and debugging is very limited (no JTAG).

I'm unable to run much at the moment as it fails and I can trace it easily.

noomio commented 3 years ago

Hi,

I traced the fault down to memalign. I had to add my own implementation as I have done for malloc ,free and realloc.

It run the benchmark but freeing some objects isn't performed well, probably due to memalign.

Thanks!

jianjunjiang commented 3 years ago

just using malloc instead of memalign,512 bytes align is not necessary.

noomio commented 3 years ago

So leaving it as align 4 and allocating the len shall be sufficient?

noomio commented 3 years ago

It worked ;)

jianjunjiang commented 3 years ago

Must ensure 8-byte alignment, double type。for 32-bits system, malloc usually 8-byte aligned, for 64-bits system, usually 16-byte aligned, the twice of void * type, Confirm your malloc alignment。

noomio commented 3 years ago

I have added this:

UCHAR mem_heap[MALLOC_BYTE_POOL_SIZE] attribute ((aligned (8)));

jianjunjiang commented 3 years ago

write customized malloc may be ok. 8-bytes align for onnx_tensor_t's datas.

noomio commented 3 years ago

It seems to work. Im also able to run the mnist model.

image

Just the debug output isnt quite right. Need to figure out the printf implementation.

noomio commented 3 years ago

I just need to append LF on every CR as I'm on windows. So far so good. Thanks for your help. The library is great!