tensorflow / tflite-micro

Infrastructure to enable deployment of ML models to low-power resource-constrained embedded targets (including microcontrollers and digital signal processors).
Apache License 2.0
1.74k stars 769 forks source link

Passing custom/additional data to kernels #2593

Closed DanieleParravicini-Synthara closed 13 hours ago

DanieleParravicini-Synthara commented 1 month ago

Hello, I have the following use case that I would like to cover.

How can I provide an efficient kernel for a layer without breaking the compatibility of the model?

Consider:

  1. a layer of a NN model (e.g. a linear layer) that could be accelerated by a dedicated hardware module. E.g. imagine you have M cores and you break the linear layers in M chunks.

  2. the custom hardware module could benefit from additional information (e.g. how many elements each core could use).

Here follows some ideas:

  1. use custom operator

    1. for each layer that you want to accelerate create a custom operator
    2. modify the NN model replacing the original layer with a custom operator that is functionally equivalent to the first one
    3. retrain the NN model
    4. convert the model to a TensorFlow Lite Model adding the additional custom operator (and possibly passing additional info to the new custom layer using custom_options field of Operator of flatbuffer)
    5. provide a implementation of the operator to the interpreter that reads the custom_options field and execute the layer accordingly
  2. keep the same operator and use a custom conversion tools that adds information on the custom_options field. In this way I can avoid the first three steps of the previous list and do the following

    1. convert the model to a TensorFlow Lite Model adding the additional custom operator and passing additional info to the new custom layer using custom_options field of Operator of flatbuffer. e.g. like this
      
      import np
      import tensorflow as tf
      from tensorflow.lite.python import schema_py_generated as schema_fb

def load_model(save_path: str): with open(save_path, "rb") as f: return f.read()

tflite_quantized = load_model("models/mlp_int8.tflite")

aModel = schema_fb.ModelT.InitFromPackedBuf(tflite_quantized, 0)

def BuiltinCodeToName(code): """Converts a builtin op code enum to a readable name.""" for name, value in schema_fb.BuiltinOperator.dict.items(): if value == code: return name return None

for i, op in enumerate(aModel.subgraphs[0].operators): op_code = aModel.operatorCodes[op.opcodeIndex].builtinCode print(f"[{i}] : {BuiltinCodeToName(op_code)} ({op_code})")

FROM HERE

custo = np.ones(10,dtype=np.uint8) aModel.subgraphs[0].operators[4].customOptions = custo

TO HERE

from tflite_support import flatbuffers b = flatbuffers.Builder(0)

b.Finish( aModel.Pack(b) ) model_buff = b.Output()

def save_tflite_model(tflite_model, save_dir, model_name): """save the converted tflite model Args: tflite_model (binary): the converted model in serialized format. save_dir (str): the save directory model_name (str): model name to be saved """ import os

if not os.path.exists(save_dir):
    os.makedirs(save_dir)
save_path = os.path.join(save_dir, model_name)

with open(save_path, "wb") as f:
    f.write(tflite_model)

save_tflite_model(model_buff, "MLP_models", "mlp_int8.tflite" )



   2.  I will have to modify the behaviour of [microinterpreter](https://github.com/tensorflow/tflite-micro/blob/dfdd666ae075e07da288ce8d4a38c60aeecbc07a/tensorflow/lite/micro/micro_interpreter.cc#L157C9-L157C47) to allow for builtin operators to have custom_options
   3.  provide a suitable implementation of the operator that uses the custom_options of the operator to do something smart

with the second approach I see the following advantages:
- I do not have to modify the model. I can just write a python script that could look at the model and "annotate the layers with custom_options" when needed.
- I have compatibility with the original model and can switch between accelerated and non accelerated kernels (e.g. in certain cases due to the fixed costs needed to start the dedicated hardware module the reference implementation or another operator implementation is better suited) 
- I lower the complexity of accelerating operators
- No need to retrain the model from scratch  
DanieleParravicini-Synthara commented 1 month ago

it might be related to https://github.com/tensorflow/tflite-micro/issues/619

rascani commented 1 month ago

Thanks for filing the issue. This is indeed an interesting problem. The typical approach for using accelerators with tflite-micro has been to use a custom op. I'd recommend taking a look at the Ethos-U example in-tree, but the gist of it is to run an AOT tool that takes a TFLite model, identify which layers can be accelerated, and merge/replace those with an Ethos-U custom op that includes a command stream in the custom data payload. This approach does not require the model to be re-trained and looks at the model as a whole, so multiple layers may get combined into a single accelerated node. As you point out, this does specialize the TFLite model for a given hardware and is no longer runnable on other devices.

While your option 2 addresses that, I think there might be a less invasive way to add extra information. The Model schema allows for Metadata to be specified. This is essentially a key-value store mapping strings to opaque data blobs. You could add an entry specific to the accelerator that encoded additional information. In your op implementation, you could then check for that metadata key and check for additional node information. Reference kernels would ignore this metadata. This approach would also allow you to describe it in whichever way would make sense for your accelerator, as opposed to having it annotated on every operator.

Does that help?

DanieleParravicini-Synthara commented 1 month ago

Thank you for your reply

Makes a lot of sense to do a AOT tool that could emit custom operators. I suspected that the option to go was either that or to modify the model with CustomOPs.

You are right, the Metadata could be a way and I tried to use it, but how could we map the specific Operator to a key that could then be used to access the Metadata? Is there an Id of the operator that one could use to access the metadata key/value store? Or something similar? The custom_options seem already there and available to use.

DanieleParravicini-Synthara commented 1 month ago

Wouldn't it be as simple as modifying this line to give all the operators access to custom_options so as to enable information exchange from an AOT tool?

rascani commented 1 month ago

My reluctance to using custom_options is because it is intended for a different purpose than this. That field is supposed to be the parameters for a custom op, not a built-in. Allowing custom_options to be populated when using a builtin operator is breaking that abstraction. It wouldn't be a change I could take upstream.

You're looking for an extension point, which is what Metadata was designed for. This would actually also require some additional changes to TFLM. We've used Metadata before for exactly this use case in TFLite, but hadn't run into it with TFLM before. We'd probably want to extend the MicroContext or MicroGraph for providing access to subgraph_id and node_id, which we could then use to lookup op specific parameters within a Metadata buffer.

DanieleParravicini-Synthara commented 4 weeks ago

Thanks @rascani. I think that now I understand your point custom_options is only for builtin.

I think we could help propagating the subgraph_id and node_id to the evaluation context in the node. Can we help?

rascani commented 4 weeks ago

That would be great!

Taking a second look, it looks like subgraph index is already available from the MicroGraph::GetCurrentSubgraphIndex() function. Kernels have access to the Graph via tflite::GetMicroContext(context)->graph().

I think I would lean towards a similar API for node id, but the thing to watch for is that kernels can invoke subgraphs. We'll probably need a similar scheme as current_subgraph_idx_ & previous_subgraph_idx.

DanieleParravicini-Synthara commented 4 weeks ago

I see you have used previous_subgraph_idx and current_subgraph_idx_ to implement a stack. So I would say I do the same and then we have a couple MicroGraph::GetCurrentSubgraphIndex(), GetCurrentOperatorIndex whihc can provide a unique identifier. ok?

DanieleParravicini-Synthara commented 4 weeks ago

If I can ask, why subgraph_idx is size_t whereas current_subgraphindex and previous_subgraph_idx are int?

What is the objective?

DanieleParravicini-Synthara commented 4 weeks ago

To start, something like https://github.com/tensorflow/tflite-micro/pull/2605 ?

rascani commented 3 weeks ago

Yeah, I think something like that works. I'll take a look at the PR later today. Thanks for adding that!

github-actions[bot] commented 18 hours ago

"This issue is being marked as stale due to inactivity. Remove label or comment to prevent closure in 5 days."