tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
186.29k stars 74.31k forks source link

Problems when compiling with Clang #35939

Closed aabadie closed 3 years ago

aabadie commented 4 years ago

@tensorflow/micro

System information

Describe the problem

I recently ported TensorFlow Lite to RIOT, an operating system for microcontrollers, using the package mechanism provided by the RIOT build system: this allows to build TensorFlow-Lite on the fly when generating a RIOT firmware and to use it's API from a RIOT application. See https://github.com/RIOT-OS/RIOT/pull/12847 for details.

RIOT provides support for many boards including many ARM based boards. It also supports both the GCC and the LLVM (Clang) toolchains.

Our CI showed problems when building with Clang, see below for details and this issue. In short, the firmware is crashing when evaluating the FullyConnected operator on ARM Cortex-M but the same thing works fine with GCC.

The example application in RIOT is just running a very basic MLP model (with Dense and Softmax layers) on an image taken from the MNIST dataset. For details of the application, you can have a look at the main_functions.cc file and the script used to generate the flatbuffers file containing the model.

Note that hello_world example is running fine, even when built with LLVM so I don't know what's wrong here: is the Python script or is it the way the MicroMutableOpResolver is built ?

Sorry for the long description but I wanted to make as complete as possible.

Please provide the exact sequence of commands/steps when you ran into the problem

The board is not very important, this command can be adapted for a lot of other ARM based boards supported by RIOT: STM32 nucleo, kinetis, etc. Note that you'll have to install the right tool for flashing the boards (OpenOCD, JLink, etc) depending on the board configuration.

wangtz commented 4 years ago

Hi Alexandre,

I wonder if we can have more specific error other than hard_fault_handler.

Looking at the backtrace

0 tflite::GetOptionalInputTensor (context=0x20001bb8 <setup()::static_interpreter+16>, node=0xc46c,

index=2)
at /work/riot/RIOT/tests/pkg_tensorflow-lite/bin/pkg/stm32f723e-disco/tensorflow-lite/tensorflow/lite/kernels/kernel_util.h:80

80 const bool use_tensor = index < node->inputs->size &&

Probably this line triggered invalid memory access? Could you check to make sure none of these pointers are nullptr?

There could be a problem with the model as well. Is your model running fine on x64 and other embedded platform?

In short, the firmware is crashing when evaluating the FullyConnected operator on ARM Cortex-M but the same thing works fine with GCC. I didn't find arm-none-eabi-clang++ is this something being developed?

Thanks,

google-ml-butler[bot] commented 3 years ago

Are you satisfied with the resolution of your issue? Yes No

sushreebarsa commented 3 years ago

@aabadie It looks like you are using an older Version of Tensorflow. Many bugs have been fixed in the latest version. Could you please execute your code using Latest stable version of TF 2.5.0 and let us know if the issue still persists? Thanks!

google-ml-butler[bot] commented 3 years ago

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] commented 3 years ago

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler[bot] commented 3 years ago

Are you satisfied with the resolution of your issue? Yes No