tensorflow / tflite-micro

Infrastructure to enable deployment of ML models to low-power resource-constrained embedded targets (including microcontrollers and digital signal processors).
Apache License 2.0
1.91k stars 822 forks source link

Improved input buffer management #705

Closed mansnils closed 1 year ago

mansnils commented 2 years ago

Supply input data and consume output data located outside the main arena. Instead, the input/output data would remain in an arbitrary memory buffer, i.e. another memory arena. The idea is to avoid copying input/output data to/from the main arena, thus saving cycles and potentially memory. The new memory planner, NonPersistentMemoryPlannerShim partly solves this as input data can be placed in an external memory arena. However in the audio/video streaming case the input could not be changed since the memory addresses based on the offsets in the memory plan would need to be recalculated during every update (before every invoke).

deqiangc commented 2 years ago

@petewarden @advaitjain

To make sure that we are on the same page. The request is: the memory address for input buffer could be changed externally by the application before each Invoke. The typical use case is double buffering of input data so that the first invoke use buffer 0, the second invoke use buffer 1, and so on.

We don't have a straightforward easy solution for now. It seems to require at least two new api exposed: one for application to tell which tensor that the TFLM shall not allocate memory, but still needs to track for the OPs; and the other for application to set buffer address for tensor. In addition to those api and complexity associated with them, the application also need to know which tensors are input tensor (aka, they would need to know the detail inside tflite file), which in my view could make things fragile ( a small change in model could lead to hard to debug issues).

I am wondering how much benefit that this feature can bring. Do you have a ballpark estimate in terms of memory saving (in percentage) and cycles (if applicable)?

advaitjain commented 2 years ago

@jenselofsson @mansnils @freddan80 did some nice analysis and the conclusion was that the benefits were not sufficient to have this be a high priority item: https://github.com/tensorflow/tflite-micro/blob/89c40b27c6b8ca031012e5b874486988bff00d1a/tensorflow/lite/micro/docs/rfc/001_preallocated_tensors.md

Having said that, since we are taking a fresh look at memory planning, this is a worthwhile feature request to keep track of.

mansnils commented 2 years ago

True. The extra complexity added to the existing memory planner didn't motivate the gain for the use cases we'd seen at that point in time. However, since we're refactoring the memory handling, we'd like to take it into account, such that the new planner shim could handle it by design. More recently we have also seen use cases were there are relatively large input data to relatively small models. Perhaps the reference implementation would not need to support this as long as it would still be possible by design.

Alex-EEE commented 2 years ago

What if, instead of allowing the input tensors to be allocated in an arbitrary memory location, TFLM directly supported the double buffering / "ping ponging" between input buffers. So, make the arena a little bigger to accommodate two input tensors, allow the user to pick which input tensor to use at Invoke (and presumably, the other input tensor would be the target of some streaming input, image from a camera, audio, etc, and would have to be blocked from use). And then for the next Invoke, they switch.

Or could you hack this together already with the existing API? (If so, I'd love to get some advice!)

github-actions[bot] commented 1 year ago

"This issue is being marked as stale due to inactivity. Remove label or comment to prevent closure in 5 days."

github-actions[bot] commented 1 year ago

"This issue is being closed because it has been marked as stale for 5 days with no further activity."