mit-han-lab / tinyengine

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory
https://mcunet.mit.edu
MIT License
792 stars 130 forks source link

get_kernel_buffer undefined #43

Open Rainlolthx opened 1 year ago

Rainlolthx commented 1 year ago

image

Thanks for your great jobs. When I use training tutorial, the c file 'convolve_1x1_s8_kbuf.c' and 'convolve_1x1_s8_skip_pad.c' in the int_forward_op, the function 'get_kernel_buffer'/'get_sbuffer_size' is used inside, and this function has an undefined error. May I ask where this function is defined or maybe I've done something wrong? I would appreciate if you could provide some help.

meenchen commented 1 year ago

Hi @Rainlolthx,

Thanks for trying our training tutorial. I have pushed the fix for this issue https://github.com/mit-han-lab/tinyengine/pull/48. Please pull the latest kernels and try again. However, I can not reproduce this issue when going through the training tutorial. Could you share what's your dev setting with this issue?

Rainlolthx commented 1 year ago

Thank you for your reply. For get_kernel_buffer problem, I did not compile the file containing this function into the program. I want to run this training tutorial on my own embedded device. Since there was no camera, I trained by entering image binary directly and removed the relevant code. But I still have some doubts. When using some cross-compilers, such as aarch64-rockchip1031-linux-gnu-gcc, aarch64-mix210-linux-gcc, the following error occurs Error: unknown mnemonic sxtb16' --sxtb16 x0,x0' Error: unknown mnemonic smlad' --smlad x0,x0,x1,x2' It seems that the 64-bit compiler does not have this instruction set, is there any solution? In the 49kb-int8-graph.json file, the model has 10 outputs. But only the first two outputs are used in main.cpp, and I don't quite understand why

meenchen commented 1 year ago

Hi @Rainlolthx,

Please use arm-none-eabi to cross-compile the program, since we target arm cortex-M7/M4 cores. The compilers you are for ARM64 cores which have different instruction sets and DSP intrinsic. That's why the compiler reports the errors.

In the 49kb-int8-graph.json, the 10 outputs are the gradients of tensors. Since storing all output gradients takes too much memory, we apply in-place gradient update and perform operator fusion to avoid large intermediate tensors to reduce the memory footprint. Feel free to check our paper for more details: https://arxiv.org/pdf/2206.15472.pdf

Rainlolthx commented 1 year ago

Hi, @meenchen

Thank you for the explanation. Using arm-none-eabi compiler, I successfully compiled the program and got the expected result. In the case of arm64, I still wonder it is possible to do the equivalent calculation with an alternative instruction set. Could you give me some references. I would appreciate it. For the training tutorial program, is this step necessary for the invoke_inf before train(label), or is it just to print the result on the screen? I also wondered if there is a way to run the detection model on TinyEngine.

meenchen commented 1 year ago

Hi, @meenchen

Thank you for the explanation. Using arm-none-eabi compiler, I successfully compiled the program and got the expected result. In the case of arm64, I still wonder it is possible to do the equivalent calculation with an alternative instruction set. Could you give me some references. I would appreciate it.

It is possible. Currently the tinyengine is tightly coupled with the intrinsic instructions of cortex-M7/M4, but we have the plan to support general platforms in the future. For those platform-independent operators, you can check out the reference kernels from tflite: https://github.com/tensorflow/tflite-micro/tree/main/tensorflow/lite/kernels/internal/reference.

For the training tutorial program, is this step necessary for the invoke_inf before train(label), or is it just to print the result on the screen?

It is not. It is only to show the result on the screen.

I also wondered if there is a way to run the detection model on TinyEngine.

Yes, we do have examples for person detection and face mask detection. These examples use openmv cam but can apply to other devices.

Rainlolthx commented 1 year ago

the person detection demo and face mask detection demo seems like inference tutorial. I wonder if the detection model can also be trained with tinyengine. Thank you again for your reply!

meenchen commented 1 year ago

the person detection demo and face mask detection demo seems like inference tutorial. I wonder if the detection model can also be trained with tinyengine. Thank you again for your reply!

That would be possible, but we did not try that because providing labels for detection models is tricky on MCU.