majianjia / nnom

A higher-level Neural Network library for microcontrollers.
Apache License 2.0
820 stars 235 forks source link

NNOM prediction function halts abruptly #151

Open siddhant0718 opened 2 years ago

siddhant0718 commented 2 years ago

After implementing the NNOM on the microcontroller for the KWS example, all the layers are built and compiled. Once the nnom_predict() is called, the model run function does not return a result and simply branches to the first function that starts the KWS. There is no value for the label or probability obtained as the model exeuction terminates abruptly. To compile the model, I had to also initialize the static memory and now the code contains both - static and dynamic memory interfaces during the execution. I initialize the static memory with a buffer as well and all the layers are built individually and this can be observed during the debugging but the model does not give an output when it runs finally, instead it simply branches back to the main function. The code looks like this for the nnom_port.h file:

`/ use static memory /

define NNOM_USING_STATIC_MEMORY // enable to use built in memory allocation on a large static memory block

                                //  must set buf using "nnom_set_static_buf()" before creating a model.

/ dynamic memory interfaces / / when libc is not available, you shall implement the below memory interfaces (libc equivalents). /

ifndef NNOM_USING_STATIC_MEMORY

#define nnom_malloc(n)      malloc(n)
#define nnom_free(p)        free(p)

endif

` This is not the case in the example given in the repository. Is the abrupt termination caused due to both memory interfaces being active or would it be another reason I need to look into? Also, during the debugging I checked the functioning layer-wise - in the first layer itself the code goes till nnom_memcpy() in the 'input_run()' function inside the 'nnom_input.c' file. But, here instead of copying the array and then resuming execution, the code automatically exits and returns to the main function where we start the entire process of building a model. I am having a hard time locating the reason why the model would do this when all the layers are built properly and the model is also compiled. Any kind of input from your side would be extremely helpful as I am stuck on this part since quite some time and after going through the sample example I have not managed to find anything that I might be missing because of which this issue occurs.

siddhant0718 commented 2 years ago

A side note - when giving the Mel frequency coefficients as input to the NNOM, I am giving them in the form of the 'int8_t' buffer but without the process used in the example where a new sequential buffer is created out of the ring buffer. I am directly calculating the 12 coefficients needed for the network using a different algorithm which is compatible with the MSP430 device I use. Could this also be an issue? Please let me know and I look forward to your answer.

majianjia commented 2 years ago

First you shall confirm memory is allocated correctly, or has given it enough static memory when you use static memory. Try to see each layer has a none-NULL return in nnom_model_create() inside the generated header.

If a NULL has been returning in any "layers[]", means no enough memory allocated. you should increase your static memory buffer or increase stack size.

I am not very confident about the MSP430 because it is a 16bit MCU. I assume the pointer size is also 16bit? NNoM uses a lot of pointer manipulation, I cannot make sure every one of them is happy with 16bit but only tested with 32bit and 64bit platforms.

Here you can change the alignment manually. https://github.com/majianjia/nnom/blob/44c3cfed7e74ee32dd1b1f1e4cdaf49ad1cf2def/inc/nnom.h#L30

siddhant0718 commented 2 years ago

The layers pass the NULL check and the default NULL check is not triggered when I run the model. And yes, the pointers are 16 bits. Alright, I will have a look at changing the alignment manually. Also there is another thing that I tried where I used the auto-test example but used an alternative dataset which was a subset of the Google Speech Commands dataset with only two categories. As I ran the example with a modified network on these two categories, the prediction function ran and executed completely but the value of the probability came out to -0.007 which seems quite unusual. Could you suggest any reason as to why the model executed in this case but the probability would end up being abnormal? And the reason I use the auto-test is because I only want a small model to be tested on a small dataset like the iris dataset instead of the MNIST which has been specified in the example. Please let me know if you have any suggestions or ideas about this. Thank you.