Open neil-tan opened 4 years ago
char input_buffer[512]; ExampleTensorObject* input_tensor_obj; int result[1];
MyModel model; //generated model.setArenaSize(1024); model.bind_input0(input_buffer, shape, type); model.bind_input1(input_tensor_obj); model.bind_prediction0(result, 1); model.run();
printf(“The inference result is: %d”, result[0]);
I think the metadata memory allocator should be fixed in size at model construction, but I am OK with the data scratchpad being on the heap.
Might look something like this:
MyModel<MetaDataSize> model;
model.setTensorDataMemSize(ScratchPadSize);
template<size_t MetaDataSize=2048>
class MyModel {
private:
FixedTensorArenaAllocator<MetaDataSize> defaultMetaDataAllocator;
DynamicTensorArenaAllocator defaultTensorDataAllocator;
...
};
We should be able to update the following draft to the re-arch without problem.
template<size_t MetaDataSize=2048>
class MyModel {
private:
//FixedTensorArenaAllocator<MetaDataSize> defaultMetaDataAllocator;
//DynamicTensorArenaAllocator defaultTensorDataAllocator;
Context& ctx;
public:
//auto generated
struct {
Tensor* tensor0 = nullptr;
Tensor* tensor1 = nullptr;
Tensor* tensor2 = nullptr;
} tensors;
void run(void);
};
template<classtype T>
void copy_tensor<T>(S_TENSOR& tensor_src, S_TENSOR& tensor_dst) {
for(size_t i = 0; i < tensor_src.getSize(); i++) {
tensor_dst->write<T>(0, i) = *(tensor_src->read<T>(0, i));
}
}
//auto generated
void MyModel::run(void) {
//allocator to re-use the space in input tensors -> allow modify
//and output tensors
get_deep_mlp_ctx(ctx, tensors.tensor0, tensors.tensor1);
ctx.eval();
S_TENSOR result = ctx.get("tensor2");
//copy the tensor out, as application should own the output memory
copy_tensor(result, tensors.tensor2);
ctx.gc();
}
// Example
char input_buffer[512];
ExampleTensorObject* input_tensor_obj; //a class with Tensor interface
int result[1];
MyModel model; //generated
model.tensors.tensor0 = RamTensor({10, 10}), i8);
model.tensors.tensor1 = WrappedRamTensor({10, 10}), input_buffer, i8);
model.tensors.tensor2 = RamTensor({10, 10}), result, i32); //output
model.run();
print(result[0]);
//do something with input_buffer
model.run();
print(result[0]);
@mbartling Thoughts? One issue I have is that tensors cannot be created before the model, unless we want to explicitly instantize the context and allocators. And, what would be a good way to keep the input/output tensors alive? Maybe create a utility tensor-factory class for initializing the context and alloc classes? The purpose of the tensor-factory is mainly for syntax sugaring, making things more approachable for the hobbyist communities.
@dboyliao visibility for code-gen @Knight-X
Just as an FYI my brain is totally dedicated to the rearch right now so I might be misreading your concerns.
The primary issue here is where do the meta-data allocator and RAM data allocators live, or if they are separate entities at all.
Maybe create a utility tensor-factory class for initializing the context and alloc classes?
This is the job of the model class, either at construction or at model run.
And, what would be a good way to keep the input/output tensors alive?
Honestly I am in favor of the user requesting references to input/output lists contained by the model itself. This way we are less prone to dealing with the user providing invalid input tensors. I imagine input tensors would be a fixed type of tensor specialization (or tensor handle) that can provide some compile time guarantees.
Abstract Individual frameworks such as uTensor and TFLM have their own sets of on-device APIs. In some cases, significant boilerplate code and framework-specific knowledge are required to implement an inference task at its simplest form. A developer-friendly universal high-level inference API will be valuable for on-device ML.
On-device inferencing is generalized into these steps:
The code snippets below aim to illustrate the current API designs for uTensor and Tensorflow. The newly proposed API will likely utilize the code-generation technology to create an adaptor layer between the universal interface and the underlying framework-specific APIs.
Examples:
uTensor:
TFLM: Please refer to this hello-world example
Requirements
The newly proposed API should have high-level abstraction aims to accelerate, simplify application development, and, helps to streamline the edge-ML deployment flow, especially for resource-constrained devices.
The new API should:
Proposals
utensor_something_autogenerated_init(utensor_mem_pool);
float input[33] = { 1,2,3,4 ... } float output[5];
utensor_run_something_autogenerated(input, 33, output, 5);
This is at its most minimal. The generated
bind
method names corresponding to the tensor names in the graph. The method’s signatures reflect their respective tensor-data-types. Additional methods can be implemented to support advanced configurations.What’s Next
This issue serves as a starting point for this discussion. It will be reviewed by uTensor core-devs, Arduino, ISG data scientists, IPG engineers, and Google. We are be particular interested in reviewing use-cases which the current proposed API cannot cover. We are looking to reiterate and converge on a design in the next weeks.