Which audio feature extraction we want to use?

ngoding commented 4 years ago

Hey guys, so based on our online meeting, we need to decide the best feature extraction for our case. Let's put all our research here!

LaodeMFauzan commented 4 years ago

For me, this paper provided a basic understanding of feature extraction in audio. https://www.ee.iitb.ac.in/~esgroup/es_mtech03_sem/sem03_paper_03307003.pdf

From the paper above, I think we are going to use spectral analysis to do the feature extraction. Here is an article for implementation of several methods of spectral analysis: http://aqibsaeed.github.io/2016-09-03-urban-sound-classification-part-1/

ngoding commented 4 years ago

Here are different types of audio feature extraction that I've found based on https://medium.com/heuristics/audio-signal-feature-extraction-and-clustering-935319d2225

Zero Cross Rate (Smoothness) https://wiki.aalto.fi/display/ITSP/Zero-crossing+rate
Energy
Entropy of Energy
Spectral Centroid
Spectral Spread
Spectral Entropy
Spectral Flux
Spectral Roll off
MFCC
Chroma Vector (12 different pitch (note)
Chroma Deviation

Chroma technique is categorising the sound based on 12 music notes, which I think it's not suitable for our case as our audio doesn't have notes and are not relevant for detecting the disease.

Zero crossing rate is determining the audio amplitude from positive, to zero, and to negative. It's basically calculating the smoothness of the sound and I think it's also not relevant.

Spectral is basically analysing the frequency based on your chosen option. You can analyse the centroid of the sound, or the spread, etc. But I haven't found many articles talking about the implementation of spectral.

The most popular method and proven reliable one is the MFCC. I've seen lots of articles talking about it and it detects the frequency as well as the amplitude of the sound. I think this one also fits with our data as it can detect any anomalies with the sound based on the amplitude and frequency.

You can read more about MFCC tutorial and visualisation here: https://smus.com/web-audio-ml-features/

https://dsp.stackexchange.com/questions/15938/is-this-a-correct-interpretation-of-the-dct-step-in-mfcc-calculation/15945#15945

ngoding / bangkit-project

Which audio feature extraction we want to use? #1