aabadie commented 4 years ago

@tensorflow/micro

System information

Host OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 19.10
TensorFlow installed from (source or binary): source (for inference), python (for training)
Tensorflow version (commit SHA if source): 1768c8f2fa155d4c6406e8ff7addf374c83de7ad for inference, release 2.0 or 2.1 for training.
Target platform (e.g. Arm Mbed OS, Arduino Nano 33 etc.): RIOT, ARM Cortex-M3/4/7

Describe the problem

I recently ported TensorFlow Lite to RIOT, an operating system for microcontrollers, using the package mechanism provided by the RIOT build system: this allows to build TensorFlow-Lite on the fly when generating a RIOT firmware and to use it's API from a RIOT application. See https://github.com/RIOT-OS/RIOT/pull/12847 for details.

RIOT provides support for many boards including many ARM based boards. It also supports both the GCC and the LLVM (Clang) toolchains.

Our CI showed problems when building with Clang, see below for details and this issue. In short, the firmware is crashing when evaluating the FullyConnected operator on ARM Cortex-M but the same thing works fine with GCC.

The example application in RIOT is just running a very basic MLP model (with Dense and Softmax layers) on an image taken from the MNIST dataset. For details of the application, you can have a look at the main_functions.cc file and the script used to generate the flatbuffers file containing the model.

Note that hello_world example is running fine, even when built with LLVM so I don't know what's wrong here: is the Python script or is it the way the MicroMutableOpResolver is built ?

Sorry for the long description but I wanted to make as complete as possible.

Please provide the exact sequence of commands/steps when you ran into the problem

Software required on the host:
- clang: on Ubuntu, can be installed with apt install clang
- ARM Cortex-M gdb, from the GNU ARM toolchain (this is for the debug command below).
- pyocd, can be installed with pip3 install --user pyocd
- socat, on Ubuntu, can be installed with apt install socat

Build/flash/run the default tensorflow-lite example of RIOT on an nrf52832-mdk:

DEVELHELP=1 RIOT_TERMINAL=socat TOOLCHAIN=llvm make BOARD=nrf52832-mdk -C tests/pkg_tensorflow-lite flash term

You can check where the program crashed using the debug target (just follow the instructions reported by the RIOT crash):

make BOARD=nrf52832-mdk -C tests/pkg_tensorflow-lite debug

Then in gdb:

set $pc=0x8006b04
frame 0
bt

You get the following output:

Reading symbols from /work/riot/RIOT/tests/pkg_tensorflow-lite/bin/stm32f723e-disco/tests_pkg_tensorflow-lite.elf...
Remote debugging using :3333
hard_fault_handler (sp=0x20001bb8 <setup()::static_interpreter+16>, corrupted=1132396544, 
exc_return=536882079, r4_to_r11_stack=0x310) at vectors_cortexm.c:393
393     __BKPT(1);
(gdb) set $pc=0x8006bbe
(gdb) frame 0
#0  tflite::GetOptionalInputTensor (context=0x20001bb8 <setup()::static_interpreter+16>, node=0xc46c, 
index=2)
at /work/riot/RIOT/tests/pkg_tensorflow-lite/bin/pkg/stm32f723e-disco/tensorflow-lite/tensorflow/lite/kernels/kernel_util.h:80
80    const bool use_tensor = index < node->inputs->size &&
(gdb) bt
#0  tflite::GetOptionalInputTensor (context=0x20001bb8 <setup()::static_interpreter+16>, node=0xc46c, 
index=2)
at /work/riot/RIOT/tests/pkg_tensorflow-lite/bin/pkg/stm32f723e-disco/tensorflow-lite/tensorflow/lite/kernels/kernel_util.h:80
#1  tflite::ops::micro::fully_connected::Eval (context=0x20001bb8 <setup()::static_interpreter+16>, 
node=0xc46c) at fully_connected.cc:172
#2  0x08002276 in tflite::MicroInterpreter::Invoke (this=0x20001ba8 <setup()::static_interpreter>)
at micro_interpreter.cc:201
#3  0x080018b0 in setup () at main_functions.cc:100
#4  0x0800164e in main (argc=5, argv=0xc46c) at main.cpp:28
(gdb) quit

The board is not very important, this command can be adapted for a lot of other ARM based boards supported by RIOT: STM32 nucleo, kinetis, etc. Note that you'll have to install the right tool for flashing the boards (OpenOCD, JLink, etc) depending on the board configuration.

wangtz commented 4 years ago

Hi Alexandre,

I wonder if we can have more specific error other than hard_fault_handler.

Looking at the backtrace

0 tflite::GetOptionalInputTensor (context=0x20001bb8 <setup()::static_interpreter+16>, node=0xc46c,

index=2)
at /work/riot/RIOT/tests/pkg_tensorflow-lite/bin/pkg/stm32f723e-disco/tensorflow-lite/tensorflow/lite/kernels/kernel_util.h:80

80 const bool use_tensor = index < node->inputs->size &&

Probably this line triggered invalid memory access? Could you check to make sure none of these pointers are nullptr?

There could be a problem with the model as well. Is your model running fine on x64 and other embedded platform?

In short, the firmware is crashing when evaluating the FullyConnected operator on ARM Cortex-M but the same thing works fine with GCC. I didn't find arm-none-eabi-clang++ is this something being developed?

Thanks,

google-ml-butler[bot] commented 3 years ago

Are you satisfied with the resolution of your issue? Yes No

sushreebarsa commented 3 years ago

@aabadie It looks like you are using an older Version of Tensorflow. Many bugs have been fixed in the latest version. Could you please execute your code using Latest stable version of TF 2.5.0 and let us know if the issue still persists? Thanks!

google-ml-butler[bot] commented 3 years ago

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] commented 3 years ago

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler[bot] commented 3 years ago

Are you satisfied with the resolution of your issue? Yes No

tensorflow / tensorflow

Problems when compiling with Clang #35939

0 tflite::GetOptionalInputTensor (context=0x20001bb8 <setup()::static_interpreter+16>, node=0xc46c,