mit-han-lab / tinyml

MIT License
761 stars 139 forks source link

How to tranform module into onnx format? #21

Open JacksonVation opened 2 years ago

JacksonVation commented 2 years ago

Hi, I recognized that we have ckpt and json files there and Im trying to transform the module into onnx format but I cant find the corresponding neural network file here. So I`m wondering how could I implement this.

JacksonVation commented 2 years ago

So Im trying to deploy the given module on my device, the tflite files certainly fits the device but I think ckpt and json should be transformed into onnx format or it wont suit the board..

JacksonVation commented 2 years ago

I successfully transformed model into onnx format, but its flash and SRAM both beyond my board. Is this because of my lack of Tinyengine? Or I didn`t compress the model in a proper way?

JacksonVation commented 2 years ago

BeyondMemory It spilt over like this.

tonylins commented 2 years ago

Hi, thanks for reaching out. Is the converted onnx file quantized to int8? Quantization will significantly reduce memory usage. The tflite file should be quantized and maybe you can try and see if it works.

JacksonVation commented 2 years ago

Hi, thanks for your nice response. Actually thats exactly what I missed. I didnt quantize the module. So today I tried a lot of ways to convert onnx file into int8, unfortunately it couldnt fit the 32CubeMX(the screenshot is below). After searching on the documentation, I found it unbelievable that the platform seems only fit quantized keras and TFLite modules. But I believe its the thing we need to do, so I`m gonna convert TFLite files tomorrow and I believe this works. image

JacksonVation commented 2 years ago

Hi, its great to use the "generate_tflite.py" to convert the model and I found that in this procedure we can quantize the model meanwhile. And what we got from this seemed similar to the tflite file which was given. So we also tested tflite files on the platform, its great to see that we reduced Flash to a proper scale but the SRAM still overflowed some. For example, if we tested the "mcunet-320kb-1mb" model, it overflows but if we tested "mcunet-256kb-1mb", it fits. However, for our device is STM32F746,which has 320kb SRAM and 1MB Flash, so I believe the former should fit. Since tflite is the quantized model , what else shall we do to reduce the spilt SRAM? Or didn`t we quantize it enough? image image

JacksonVation commented 2 years ago

Its so weird that the "mcunet-320kb-1mb" even needs more SRAM than the "mcunet-512kb-2mb". We just tested the bigger one "mcunet-521kb-2mb", and it surprised us that it only occupies 416.75KB, which is surprisingly smaller than "mcunet-320kb-1mb"s 467.03KB. image Maybe there`s some unusual things with "mcunet-320kb-1mb_imagenet.tflite" file.

tonylins commented 2 years ago

Hi, the memory usage is dependent on the system stack. We used TinyEngine in our experiments, which will have a different memory usage compared to Cube AI, so it should be normal if the peak memory does not align. The 320KB model should fit the device with TinyEngine, but may not for Cube AI.

JacksonVation commented 2 years ago

Hi, that definitely makes sense. Thanks for your response. And then were gonna try to deploy the adaptive model on our device to implement some functions on it just like what youve showed in your Demo video which is really cool!