pliang279 / MultiBench

[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
MIT License
478 stars 68 forks source link

Extracting info from the H5 files #32

Open mirix opened 1 year ago

mirix commented 1 year ago

Hello,

I would be interested to train an audio-only model (or, perhaps, a bimodal audio-text one) using CMU-MOSEI data.

I would be recomputing the audio embeddings.

So I would need only the links to the videos plus the timestamps and the annotated emotions per timestamp range.

How would I go about extracting this information?

Thanks,

Ed

mirix commented 1 year ago

Ok, perhaps I am getting to something:

import h5py
import numpy as np
import pandas as pd

filename = '/home/emoman/Downloads/mosei/CMU_MOSEI_Labels.csd'

hf = h5py.File(filename)

features = hf.get('All Labels/data/zv0Jl4TIQDc/features')
feat = np.array(features)
df_feat = pd.DataFrame(feat)
print(df_feat)

intervals = hf.get('All Labels/data/zv0Jl4TIQDc/intervals')
intval = np.array(intervals)
df_intval = pd.DataFrame(intval)
print(df_intval)

This gives:

          0         1    2         3    4    5    6
0  0.333333  0.666667  0.0  0.666667  0.0  0.0  0.0
1  1.000000  2.000000  0.0  0.000000  0.0  0.0  0.0
2  2.333333  2.666667  0.0  0.000000  0.0  0.0  0.0
        0       1
0  56.852  60.845
1  29.764  35.633
2  42.146  49.242

My interpretation is that video zv0Jl4TIQDc has three intervals annotated with the relative weights of Ekman's basic emotions.

Is that correct?

If that is the case, what would be the mapping of the emotions?

What is the highest possible value for a given emotion?

mirix commented 1 year ago
Each sentence is annotated for sentiment on a [-3,3]
Likert scale of: [−3: highly negative, −2 negative,
−1 weakly negative, 0 neutral, +1 weakly positive,
+2 positive, +3 highly positive]. Ekman emotions
(Ekman et al., 1980) of {happiness, sadness, anger,
fear, disgust, surprise} are annotated on a [0,3] Lik-
ert scale for presence of emotion x: [0: no evidence
of x, 1: weakly x, 2: x, 3: highly x].

So column zero is the Likert score and then the other columns would be, in this order, {happiness, sadness, anger, fear, disgust, surprise} ?

mirix commented 1 year ago

The issue with this interpretation is that segment 0 above would have been labelled with happiness and anger in similar amounts...

mirix commented 1 year ago

Or is it (Anger Disgust Fear Happy Sad Surprise) as in Table 3?

Then it would be Anger and Fear, which is more consistent, but the sentiment would be slightly positive...

mirix commented 1 year ago

Checking the entries with the most negative and positive sentiment, it seems to be {happiness, sadness, anger, fear, disgust, surprise}

mirix commented 1 year ago

I have forked MOSEI to build a unimodal SER dataset:

https://github.com/mirix/messaih/tree/main