marcogdepinto / emotion-classification-from-audio-files

Understanding emotions from audio files using neural networks and multiple datasets.
GNU General Public License v3.0
405 stars 133 forks source link

Prediction is 35% wrong #5

Closed ghost closed 5 years ago

ghost commented 5 years ago

I ran your model on below mentioned filenames. Out of 60 files, the result was 21 times wrong. Am I doing anything wrong?


Prediction for file   [ ../Audio_Speech_Actors_01-24/Actor_01/03-01-01-01-01-02-01.wav ] is   calm          .Should be:  neutral Match?:  False
--
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-06-01-01-01-01.wav ] is   neutral       .Should be:  fearful Match?:  False
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-08-02-01-02-01.wav ] is   surprised     .Should be:  surprised  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-08-02-01-01-01.wav ] is   surprised     .Should be:  surprised  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-06-01-02-01-01.wav ] is   neutral       .Should be:  fearfulMatch?:  False
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-08-02-02-02-01.wav ] is   surprised     .Should be:  surprised  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-03-01-02-01-01.wav ] is   happy         .Should be:  happy  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-04-01-01-02-01.wav ] is   sad .Should be:  sad    Match?:  True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-07-01-01-02-01.wav ] is   surprised     .Should be:  disgustMatch?:  False
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-06-02-02-01-01.wav ] is   fearful       .Should be:  fearfulMatch?:  True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-04-01-01-01-01.wav ] is   sad .Should be:  sad    Match?:  True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-02-01-01-02-01.wav ] is   calm          .Should be:  calm  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-02-02-02-01-01.wav ] is   calm          .Should be:  calm  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-06-01-01-02-01.wav ] is   sad .Should be:  fearful    Match?: False
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-06-02-01-02-01.wav ] is   fearful       .Should be:  fearfulMatch?:  True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-02-02-01-02-01.wav ] is   calm          .Should be:  calm  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-08-01-01-02-01.wav ] is   surprised     .Should be:  surprised  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-04-02-01-01-01.wav ] is   neutral       .Should be:  sad  Match?:    False
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-03-02-02-01-01.wav ] is   happy         .Should be:  happy  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-07-02-01-02-01.wav ] is   fearful       .Should be:  disgustMatch?:  False
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-07-01-01-01-01.wav ] is   surprised     .Should be:  disgustMatch?:  False
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-03-02-01-01-01.wav ] is   sad .Should be:  happy    Match?:  False
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-02-02-01-01-01.wav ] is   calm          .Should be:  calm  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-04-01-02-02-01.wav ] is   neutral       .Should be:  sad  Match?:    False
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-02-02-02-02-01.wav ] is   calm          .Should be:  calm  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-04-01-02-01-01.wav ] is   sad .Should be:  sad    Match?:  True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-03-01-01-02-01.wav ] is   happy         .Should be:  happy  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-05-01-01-02-01.wav ] is   angry         .Should be:  angry  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-01-01-01-01-01.wav ] is   neutral       .Should be:  neutralMatch?:  True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-06-02-01-01-01.wav ] is   fearful       .Should be:  fearfulMatch?:  True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-08-01-01-01-01.wav ] is   surprised     .Should be:  surprised  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-02-01-02-01-01.wav ] is   calm          .Should be:  calm  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-05-01-02-02-01.wav ] is   angry         .Should be:  angry  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-07-01-02-02-01.wav ] is   neutral       .Should be:  disgustMatch?:  False
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-03-01-01-01-01.wav ] is   happy         .Should be:  happy  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-01-01-02-01-01.wav ] is   neutral       .Should be:  neutralMatch?:  True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-06-02-02-02-01.wav ] is   fearful       .Should be:  fearfulMatch?:  True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-07-02-01-01-01.wav ] is   happy         .Should be:  disgustMatch?:  False
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-07-01-02-01-01.wav ] is   sad .Should be:  disgust    Match?: False
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-07-02-02-01-01.wav ] is   happy         .Should be:  disgustMatch?:  False
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-04-02-01-02-01.wav ] is   surprised     .Should be:  sad  Match?:    False
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-02-01-02-02-01.wav ] is   neutral       .Should be:  calm  Match?:    False
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-04-02-02-01-01.wav ] is   sad .Should be:  sad    Match?:  True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-06-01-02-02-01.wav ] is   neutral       .Should be:  fearfulMatch?:  False
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-08-01-02-02-01.wav ] is   surprised     .Should be:  surprised  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-05-02-01-02-01.wav ] is   angry         .Should be:  angry  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-05-01-02-01-01.wav ] is   surprised     .Should be:  angry  Match?:    False
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-05-02-01-01-01.wav ] is   angry         .Should be:  angry  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-02-01-01-01-01.wav ] is   calm          .Should be:  calm  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-07-02-02-02-01.wav ] is   surprised     .Should be:  disgustMatch?:  False
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-03-02-02-02-01.wav ] is   happy         .Should be:  happy  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-05-02-02-01-01.wav ] is   angry         .Should be:  angry  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-08-01-02-01-01.wav ] is   surprised     .Should be:  surprised  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-03-02-01-02-01.wav ] is   happy         .Should be:  happy  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-05-02-02-02-01.wav ] is   angry         .Should be:  angry  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-08-02-02-01-01.wav ] is   surprised     .Should be:  surprised  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-05-01-01-01-01.wav ] is   angry         .Should be:  angry  Match?:    True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-01-01-02-02-01.wav ] is   neutral       .Should be:  neutralMatch?:  True
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-03-01-02-02-01.wav ] is   neutral       .Should be:  happy  Match?:    False
Prediction for file [   ../Audio_Speech_Actors_01-24/Actor_01/03-01-04-02-02-02-01.wav ] is   neutral       .Should be:  sad  Match?:    False
marcogdepinto commented 5 years ago

Please post the code you are using that returned that output: my livePredictions.py does not have that kind of output.

Also, in the other issue you have opened it seems that you have trained again the model: are you using the model downloaded from the repo or the model you've retrained?

ghost commented 5 years ago

Hi,
I have not done training. I just tweaked the prediction. Here is my code:

#import keras
import numpy as np
import librosa

class livePredictions:

    def __init__(self, path, file):
        #import keras

        self.path = path
        self.file = file

    def load_model(self):
        '''
        I am here to load you model.

        :param path: path to your h5 model.
        :return: summary of the model with the .summary() function.

        '''
        import keras
        self.loaded_model = keras.models.load_model(self.path)
        return self.loaded_model.summary()

    def makepredictions(self, verify=True):
        '''
        I am here to process the files and create your features.
        '''
        data, sampling_rate = librosa.load(self.file)
        mfccs = np.mean(librosa.feature.mfcc(y=data, sr=sampling_rate, n_mfcc=40).T, axis=0)
        x = np.expand_dims(mfccs, axis=2)
        x = np.expand_dims(x, axis=0)
        predictions = self.loaded_model.predict_classes(x)
        predicted_emotion = self.convertclasstoemotion(predictions)
        if verify:
            actual_emotion = self.get_emotion_str(self.file[44:46])
            print( "Prediction for file [", self.file, "] is", " ", predicted_emotion + "\t.Should be: ", actual_emotion, " Match?: ", predicted_emotion==actual_emotion)
        else:
            print( "Prediction for file [", self.file, "] is", " ", predicted_emotion)

    def makepredictions1(self, file, verify=True):
        '''
        I am here to process the files and create your features.
        '''
        self.file = file
        self.makepredictions(verify)

    def get_emotion_str(self, code):
        pred = int(code)
        if pred == 1:
            pred = "neutral"
            return pred
        elif pred == 2:
            pred = "calm"
            return pred
        elif pred == 3:
            pred = "happy"
            return pred
        elif pred == 4:
            pred = "sad"
            return pred
        elif pred == 5:
            pred = "angry"
            return pred
        elif pred == 6:
            pred = "fearful"
            return pred
        elif pred == 7:
            pred = "disgust"
            return pred
        elif pred == 8:
            pred = "surprised"
            return pred

    def convertclasstoemotion(self, pred):
        '''
        I am here to convert the predictions (int) into human readable strings.
        '''
        self.pred  = pred

        if pred == 0:
            pred = "neutral"
            return pred
        elif pred == 1:
            pred = "calm"
            return pred
        elif pred == 2:
            pred = "happy"
            return pred
        elif pred == 3:
            pred = "sad"
            return pred
        elif pred == 4:
            pred = "angry"
            return pred
        elif pred == 5:
            pred = "fearful"
            return pred
        elif pred == 6:
            pred = "disgust"
            return pred
        elif pred == 7:
            pred = "surprised"
            return pred

# Here you can replace path and file with the path of your model and of the file from the RAVDESS dataset you want to use for the prediction,
# Below, I have used a neutral file: the prediction made is neutral.

pred = livePredictions(path='Emotion_Voice_Detection_Model.h5',
                       file='01-01-01-01-01-01-01.wav')

pred.load_model()
#pred.makepredictions()
import os
# files = os.listdir('../Audio_Speech_Actors_01-24/Actor_01')
# for f in files:
#     #print(f)
#     pred.makepredictions1('../Audio_Speech_Actors_01-24/Actor_01/' + f)

files = os.listdir('../MyAudio')
for f in files:
    #print(f)
    pred.makepredictions1('../MyAudio/' + f, False)
marcogdepinto commented 5 years ago

The file version you are using is old. Read the updated one and in particular the convertclasstoemotion function. Also, your getemotionstring is additional to my code (not written by me) and the codification is wrong (1 to 8 instead of 0 to 7).

ghost commented 5 years ago

Hi @marcogdepinto : I did a git clone and got that file. Can you point out how I come to know if file is correct and provide latest file direct link.
Also, My numbering is starting from 1 because this is actual numbering provided by RAVDESS. Your numbering is 0-7 because that is your prediction output (base 0)

ghost commented 5 years ago

HI @marcogdepinto : Can you look into my comment please.

marcogdepinto commented 5 years ago

Hi Sandeep. I am afraid but I will look at your comment when I have time: I do not have any SLA or urgency to answer to questions for an open source project I did for fun months ago and I am not working on at the moment.

Staten this, you made a git clone but you have also added code: in my file there is no get_emotion_str function. Please, double check your code: you have added more functions and you should have them work if you need to.