hkveeranki / speech-emotion-recognition

Speaker independent emotion recognition
http://speechemotionrecognition.repos.hkveeranki.com/
MIT License
315 stars 100 forks source link

Unable to understand the the concept of padding, in utilities.py file? #17

Closed Hassan1175 closed 5 years ago

Hassan1175 commented 5 years ago

I know the length of voice signals vary from file to file or you can say that there may b some outliers in the data set. But padding adds zeros in the data. So why we are aimed to equalize the length of the audio signal with zeros? If we are adding zeros in the data, will it distort original data, if yes then we are padding the data?

My second question is that how voice data is normalized, did you normalize the data in current project?

hkveeranki commented 5 years ago

But padding adds zeros in the data. So why we are aimed to equalize the length of the audio signal with zeros?

We are adding zeros and sometimes slicing the data to make sure that all samples will produce the feature vectors or same size.

So why we are aimed to equalize the length of the audio signal with zeros?

Adding zeros is adding silence to the voice data. So it ideally shouldn't change anything in the data.

Hassan1175 commented 5 years ago

Make sense. Why we not get the longest length in our data, as our standard length instead of mean length? In that way we will not need to slice the data, result in not losing any data. .isn't it?

hkveeranki commented 5 years ago

The rationale is that, the longest signal in the data often contains a lot of noise and if we pad everything to longest, it will involve a lot of computation. You are free to change the parameter, and try for your dataset with maximum length. If that gives you better results, you can proceed with that.

Hassan1175 commented 5 years ago

Got it. Thanks a lot for your reply.