Resolving MFCC results in C vs. Python Mismatch and Creating PC test for the KWS Example

HadeelMabrouk commented 3 years ago

This fixes the mismatch between the MFCC in C vs in Python. To achieve this, the following was edited:

forcing the datatypes in Python to be float32 throughout the process of generating the MFCC.
While investigating more, I changed the implementation of some internal function, e.x. hanning window generation, mel2hz and hz2mel to be consistent in both implementations.
The main reasons the outputs were inconsistent were the following. First, the Mel-filter banks generation function in Python was different from its implementation in C. So, I took the algorithm in Python in wrote it in C to generate consistent filterbanks.
Additionally, there was a mistake in the C implementation that it initially ignores the first coefficient which is fine since we take coefficient from 2:13. However, they later overwrite the first coefficient by the log of the total energy, while their replacement should take place to the first coefficient that they already skipped. So, I fixed this part as well.
Finally, I edited some minor operations associated with data conversion for the MFCC outputs to be identical, e.x. flooring vs. ceiling, ..etc.

In addition to MFCC handling, I also created a simple program to test the KWS test data on PC by automating the process of reading the C arrays of the audio samples and feeding them into the model for inference to better learn about its performance and print the top-1 accuracy scores accordingly. The steps to run this program is updated in the KWS example ReadMe file.

majianjia commented 3 years ago

Thank you for investigating the problem. All commit until https://github.com/majianjia/nnom/pull/136/commits/c1f364c691882eefe2cd33dad0737f293affa293 looks fine. Are you willing to commit more or I will merge it now.

HadeelMabrouk commented 3 years ago

I believe that's it for now. One final note is that the size of the MFCC images generated to be fed into the current models is only (62,12,1) and not (63,12,1) as stated in the readme file, which I believe might be confusing to someone who's trying to reproduce the results as I was, initially. Maybe the current info on the readme file is related to an old model.

majianjia commented 3 years ago

Yes, the document hasnt changed since long time ago, but the model and code keep changing. I will check later and update them if I can. Thank you

HadeelMabrouk commented 3 years ago

Also, I believe that the changes in the MFCC C and Python implementations can be also useful for the Denoise example users. However, I haven't tested it. Thank you.

majianjia commented 3 years ago

I believe so. But I doubt there will be a big difference. I did try 2 different mel bank functions and compared them, they produce very close results. So I pick one that seems to have less computational cost on the C side and didn't bother to change the python part. Maybe I was wrong.

HadeelMabrouk commented 3 years ago

I agree that the new modifications in the C implementation might not be as computationally efficient as the initial one. However, the current consistency in the MFCC output managed to raise the top-1 accuracy score on the test data for the KWS example from 72.5% to 80.1% which I believe is quite a noticeable difference.

majianjia / nnom

Resolving MFCC results in C vs. Python Mismatch and Creating PC test for the KWS Example #136