pliang279 / MultiBench

[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
MIT License
462 stars 68 forks source link

mosei dataset, why some label isn't int type( belong to [-3,3]), but is float? #8

Closed TianhaoFu closed 2 years ago

TianhaoFu commented 2 years ago

For example, why some label is 1.333?

Thanks:)

Vanvan2017 commented 2 years ago

Thanks a lot for the question, since the MOSEI dataset is a regression dataset labeled by sentiment intensity, the label of each data instance is a float number in the range of -3.0 to 3.0 (average value from the annotators). To better evaluate the models, we can also casting the float number to the int to perform 2-class, 5-class or 7-class classification tasks. More in advance, the dataset does not only contains the sentiment intensities but also labels of the presenting of six emotions (happy, angry, etc.). More about the dataset: https://www.ml.cmu.edu/research/dap-papers/S18/dap-liang-paul-pu.pdf

TianhaoFu commented 2 years ago

Thanks a lot for the question, since the MOSEI dataset is a regression dataset labeled by sentiment intensity, the label of each data instance is a float number in the range of -3.0 to 3.0 (average value from the annotators). To better evaluate the models, we can also casting the float number to the int to perform 2-class, 5-class or 7-class classification tasks. More in advance, the dataset does not only contains the sentiment intensities but also labels of the presenting of six emotions (happy, angry, etc.). More about the dataset: https://www.ml.cmu.edu/research/dap-papers/S18/dap-liang-paul-pu.pdf

Thanks for your reply!

your mean is that mosei label maybe -2.3 or 1.3?

By the way , casting the float number to the int is the evaluation method in test data. how do you evaluate its performance in the valid data? use torch.nn.L1Loss?

Thanks:)

Vanvan2017 commented 2 years ago

Yeah, it is. L1Loss will return the MAE value of the predictions, this is a traditional method to evaluate a regression dataset. As for casting the label for classification, take 2 class for example, we can use positive and negative to classify them.