stm32: Debug cause of board reset when trying to run the hello world example

mocleiri commented 2 years ago

MicroPython v1.16-222-g44818d1a3-dirty on 2021-09-08; NUCLEO_H743ZI2 MICROLITE with STM32H743

Type "help()" for more information.
>>> import hello_world
interpreter_make_new: model size = 2488, tensor area = 20048
Failed to allocate tail memory. Requested: 61942520, available 19888, missing: 61922632
Failed starting model allocation.

AllocateTensors() failed!
time step,y
MicroPython v1.16-222-g44818d1a3-dirty on 2021-09-08; NUCLEO_H743ZI2 MICROLITE with STM32H743
Type "help()" for more information.

In testing we are getting a very large allocation number. The allocation is supposed to all fit within the allocated tensor area.

I wonder if this is caused by alignment issues. I removed some of the alignment functions that were in the original files from openmv.

mocleiri commented 2 years ago

I'm setup using the st-link gdb debugger in windows linked via networking to vscode in windows subsystem for linux for debugging now. I can debug the decompiled assembler but need to rebuild both tensorflow and micropython so that I can get vision into the source files for where the corruption might be.

mocleiri commented 2 years ago

This is the tensor area alignment area code I picked up originally from openmv. It wasn't needed for esp32 but I think this issue may be related to the alignment not being done.

https://github.com/mocleiri/tensorflow-micropython-examples/commit/8ca54269d2e077d859b6a847fc99f704f954aa97#diff-17dcba6908911e843de95aa34fb0735d8d6c08a290e859b8e20242b2f6fff612L73

mocleiri commented 2 years ago

On esp32 using openocd vscode stopped directly on the point of the exception. Lets try running the stm32 using openocd instead of stlink and see if it works the same.

mocleiri commented 2 years ago

I found out that if you import pyb and then set:

pyb.fault_debug(1)

When the board restarts instead of restarting it will hold the board in a place where you can see the stack where the exception occured.

I'm able to reproduce the error 100% when trying to step into https://github.com/tensorflow/tflite-micro/blob/c0adfb0567ecfe7990541a9fdef77018bc83d6fe/tensorflow/lite/core/api/op_resolver.cc#L41

The all_op_resolver is defined as a static variable but due to compile options its not possible to look inside at what could be the problem. I need to build at -O0 to not have any optimizations made.

I'm also seeing that more math operations are going to be needed. I may look at just removing the specific list of musl maths ops selected and adjust the build so I can just link to the standard math library.

This is especially a problem when trying to build the cmsis_nn kernel versions.

I also think there could be a version conflict since the cmsis used by micropython is from 2019 where as the version downloaded by tensorflow is from december 2020. I'm not sure if I can point micropython to use the one provisioned by tensorflow or if I need to update the stm32lib git project that is being used for the cmsis api and logic.

mocleiri commented 2 years ago

Micropython contains a subset of libm depending on if mp_float is a double or a float in C. Tensorflow needs additional math ops.

The current approach is to link the math library. I think the result will be to add everything else not included in the partial libm implementation.

I also link the stdc++ library but then don't link a standard c library. The C functions come from the partial implementation in micropython and then supplemented here.

There is also a difference in the cmsis version between what is in micropython/lib/cmsis/inc and what is downloaded by tensorflow. After overwriting the micropython includes with the more recent cmsis downloaded via tensorflow linking the firmware was possible.

I am able to run hello_world now using my Nucleo H743ZI2 board.

mocleiri commented 2 years ago

I flashed the firmware on the build for this fix and it works to run hello_world.

I'm closing this issue and will file new ones for other improvements.

mocleiri / tensorflow-micropython-examples

stm32: Debug cause of board reset when trying to run the hello world example #33