Closed sailist closed 2 years ago
I got it, it should be 1582, not 100.
@sailist In Iemocap only scripted version of the dataset is used. Is it correct? Plz help
What is the meaning of 'scripted version' and 'correct'?
There are two versions of this datasets , improvised and scripted. Did they used only scripted as stated in paper, But there features are indicating they have used all 5531 utterances. Can we talk on email if possible.
Where can you find the improvised and scripted feature files? I still can't understand your intention. My suggestion is to use COGMEN's IEMOCAP feature file to train a IEMOCAP model and use MMGCN's MELD feature file to train a MELD modal. You can easily find them from each repository.
Welcome to email me if you want, but I'd better like discussing in github.
Thanks. All I am saying IEMOCAP has two versions in dataset total 5531 files using four emotions only, including improvised and scripted utterances. Few people use only improvised or only scripted and of course few used combination of both that is 5531 files. I think you did not use this dataset from scratch but these readymade features only.
Also through email I can show you my code snippets where I am getting better results on IEMOCAP. But I am not able to execute this MMGCN code, it gt stuck at NotImplement Error.
When it comes to audio feature, you said the acoustic raw features are extracted using the OpenSmile toolkit with IS10 configuration, which should be 100 dimention. This configuration was also used in paper "COGMEN COntextualized GNN based Multimodal Emotion recognitioN".
Your code runs well, but when I print the audio feature shape, I got 1582 dimention instead of 100.
May I ask how do you get the acoustic feature?