antofara commented 5 years ago

I am deploying a model consisting of a GRU with 128 input units and 64 Hidden units (total 37,056 parameters) on the following platform: NRF52832 Arm cortex M4, with 64 KB ram and 512 KB FLASH.

To build the tflite model I follow the steps below:

converter=tf.lite.TFLiteConverter.from_keras_model(new_gru)
converter.optimizations= [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
gruQ=converter.convert()
f=open('gruQ.tflite',"wb").write(gruQ)

and I convert it to a c++ array with xxd -i gruQ.tflite > gruQ.h

On the embedded platform:

static tflite::MicroErrorReporter micro_error_reporter;
      error_reporter = &micro_error_reporter;

   //extern unsigned char* quantQ5_tflite;
   const tflite::Model* model = ::tflite::GetModel(gruQ_tflite);

   if (model->version() != TFLITE_SCHEMA_VERSION) {
     error_reporter->Report(
         "Model provided is schema version %d not equal "
         "to supported version %d.\n",
         model->version(), TFLITE_SCHEMA_VERSION);
   }

   // This pulls in all the operation implementations we need
   tflite::ops::micro::AllOpsResolver resolver;

   static tflite::MicroInterpreter static_interpreter(
         model, resolver, tensor_arena, kTensorArenaSize, error_reporter);
     interpreter = &static_interpreter;

     TfLiteStatus allocate_status = interpreter->AllocateTensors();

The last line of code generates a HardFault regardless the size of the TensorArena, which I also tried to set to the maximum allowed by my system (451024). More into the details, the function call that produces it is: `if (auto array = buffer->data()) { in tensorflow/lite/experimental/micro/micro_allocator.cc Line 259` The size of the converted model (flatbuffer) is 43304

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): NO
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): v2.0.0-rc2-26-g64c3d382ca 2.0.0
Python version: 3.7.3

petewarden commented 5 years ago

Sorry you're hitting this issue! As a debugging step, can you try running your same code on Linux/x86? It would be helpful to know if it works there, and if it does I can suggest some further debugging steps.

antofara commented 5 years ago

Thanks for your reply, do you mean to compile the same code using Linux/x86 as target instead of ARM cortex M4, or just to do model conversion and compile the firmware in a linux/x86 environment?

I have also some updates. I tried to convert the model using tflite_convert command line tool instead of doing it with the python API. Now it allocate successfully the tensors, but it generates HardFault when executing interpreter->Invoke();. By doing some debug, I found out that the critical part is in op_resolver.cc Line:36, *registration = op_resolver.FindOp(builtin_code, version);. When the operation is a BuiltinOperator_MUL. Elementwise multiplication is used inside a GRU during the last steps to generate the new hidden state, but I don't see any support to MUL in lite/experimental/micro/kernels/all_ops_resolver.cc . The consequence is that when FindOp is called, in micro_mutable_ops_resolver.cc Line:22 the for loop keeps going until an invalid index for registrations_[i] is reached, thus causing HardFault. Is my guess correct? Is there any possible workaround?

petewarden commented 5 years ago

I did notice in your code snippet that the OpResolver isn't declared static, like the interpreter is. I'm not sure what the rest of your code looks like, but if the OpResolver object has a shorter lifetime than the interpreter then you could end up with mysterious crashes like this. If you look in the examples, you can see we declare resolvers as static in the setup() function:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/micro/examples/hello_world/main_functions.cc#L63

antofara commented 5 years ago

@petewarden, I fixed OpResolver declaring it as static, but the problem remains.

I guess the only solution is to implement in TFLite micro the elementwise multiplication (BuiltinOperator_MUL) which is used inside (among others) GRUs
Shall I open a new issue/feature request and close this one?

antofara commented 4 years ago

operation MUL is now integrated with TFLite Micro. However, GRUs cannot still be deployed due to missing operator SUB.

I take the chance for a related questions. Is there a way to enforce the converter to generate tflite ops of a specific version? For example, I noticed that after conversion my model contains AVERAGE_POOL_2D operations in version 2. Despite the declared compatibility in all_ops_resolver is only for version 1, by enforcing it to accept also version 2 (by just manually modifying the resolver), the code executes without problems. I am now wondering if it is possible to tell the converter to generate only versions 1 of that operation. Also, is there a document that clearly shows differences between versions of the same operation?

njeffrie commented 4 years ago

I notice you're using OPTIMIZE_FOR_SIZE in your converter optimizations. I am wondering if you are inadvertently creating a hybrid quantized model (which we do not support on Micro).

Can you try uploading your TFLite model to https://lutzroeder.github.io/netron/ to check if both the weight tensors and activation tensors are quantized? If only the weights are quantized, you likely have a hybrid model, and will need to either disable the OPTIMIZE_FOR_SIZE flag to get a float model, or add input and output types along with a representative dataset.

mohantym commented 1 year ago

Hi @antofara ! We are checking to see whether you still need help in this issue . I agree with above point. Could you test with post training quantization from TF 2.11 version with below changes and let us know.

converter.optimizations= [tf.lite.Optimize.DEFAULT]

For inference

constexpr int kTensorArenaSize = 2000;
uint8_t tensor_arena[kTensorArenaSize];

  // Build an interpreter to run the model with
tflite::MicroInterpreter interpreter(model, resolver, tensor_arena, #Skipping static keyword as suggested.
                                       kTensorArenaSize);
// Allocate memory from the tensor_arena for the model's tensors
TF_LITE_MICRO_EXPECT_EQ(interpreter.AllocateTensors(), kTfLiteOk);

Thank you!

google-ml-butler[bot] commented 1 year ago

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] commented 1 year ago

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No

tensorflow / tensorflow

TFLite-micro: AllocateTensors produces HardFault even for small models #33464

For inference