thuiar / MMSA

MMSA is a unified framework for Multimodal Sentiment Analysis.
MIT License
642 stars 104 forks source link

About the released Processed Data #48

Closed RH-Lin closed 2 years ago

RH-Lin commented 2 years ago

Thank you for your wonderful work! But I have some questions about the released dataset in BaiduYun Disk: The Processed Data in MOSEI have 74 and 35 feature dimensions for audio and vision modality, which can be figured out that these two modalities features are extracted by COVAREP and Facet. However, the Processed Data in MOSI have 5 and 20 feature dimensions for audio and vision modality, what feature extractor did you use to extract MOSI data? As far as I know, MOSI audio and vision modality features extrated by COVAREP and Facet have 74 and 47 feature dimensions. Thank you for your answer!

Columbine21 commented 2 years ago

Hi @GHBigHandsome, the provided audio and visual modality features are the original ones extracted by the CMU Multi-camp team (http://immortal.multicomp.cs.cmu.edu/raw_datasets/processed_data/).

As stated in our ACL22 Demo paper intro section ( https://arxiv.org/pdf/2203.12441.pdf ), the feature can not be reproduced easily due to the manually feature selection (as stated in http://immortal.multicomp.cs.cmu.edu/raw_datasets/processed_data/cmu-mosi/readme.MD).