about audio from dataset MOSEI

thuiar / MMSA

MMSA is a unified framework for Multimodal Sentiment Analysis.

MIT License

642 stars 104 forks source link

about audio from dataset MOSEI #28

Closed Tongyuang closed 2 years ago

Tongyuang commented 2 years ago

Hi, I just want to know how the feature of audio from dataset MOSEI is calculated.

I loaded one of the datasets,

filename = './aligned_50.pkl'
RawData = pickle.load(open(filename,'rb'),encoding='utf-8')
print(RawData['train']['audio'].shape)

the result goes:

(16326, 50, 74)

so it means each audio piece has a feature of shape(50,74), but how to calculate these features from raw audio files?(like .mp3 or .wav files?)

FlameSky-S commented 2 years ago

The audio features are consistant with those provided by the CMU team. As mentioned in their paper: We use the COVAREP software to extract acoustic features including 12 Mel-frequency cepstral coefficients, pitch, voiced/unvoiced segmenting features, glottal source parameters, peak slope parameters and maxima dispersion quotients.

Tongyuang commented 2 years ago

Thank you. It helps