Dataset and feature extraction model

vishaal27 / Multimodal-Video-Emotion-Recognition-Pytorch

A Pytorch implementation of emotion recognition from videos

16 stars 1 forks source link

Dataset and feature extraction model #1

Closed AnilRahate closed 3 years ago

AnilRahate commented 3 years ago

Hi Vishal,

Your explanation and experiments to prove multimodality benefits are quite easy to follow and understand. Would it be possible for you to share the details of raw dataset used? Also, programs to extract features specially audio MFCC. I am finding it bit difficult to extract MFCCs from audio as it involves multiple samples and windows.

Thanks a lot for sharing your work on Github.

vishaal27 commented 3 years ago

Hi, thanks for your interest. The raw dataset can be found here: https://zenodo.org/record/1188976#.X287GmgzY2w. For the MFCC features, you can use the librosa library. I'm attaching a gist of the code for extracting features from a sample audio file:

import librosa

file_series, sampling_rate = librosa.load('./test.mp4', sr=None)
file_series = file_series[:sampling_rate*3] # truncating every audio file to a max of 3 seconds
mfcc_feats = librosa.feature.mfcc(y=file_series, sr=sampling_rate)