About datasets - Githubissues

tae898 / erc

The official implementation of "EmoBERTa: Speaker-Aware Emotion Recognition in Conversation with RoBERTa"

MIT License

88 stars 23 forks source link

About datasets #20

Closed XinyeDu1204 closed 2 years ago

XinyeDu1204 commented 3 years ago

Hey, I saw that in 'multimodal-datasets', there is only the datasets about face-feature, but this project only use text part. Is there any sharing of the text part of IEMOCAP dataset? Thanks!

tae898 commented 3 years ago

Hey there,

The text features are extracted by RoBERTa. Since it's relatively easy to extract them by running the huggingface model, I didn't upload them. Do you still think you need them?

XinyeDu1204 commented 3 years ago

Thanks for your reply! Are the files in the “multimodal-datasets-main\IEMOCAP\raw-texts” file processed data? Can you share the code which process the data in IEMOCAP dataset? Thank you very much!

tae898 commented 3 years ago

Welcome!

The files in that directory are very close to the raw data. I just made them in an utterance level.

Below is an example from multimodal-datasets/IEMOCAP/raw-texts/train/Ses01F_impro01_F000.json

{
    "Utterance": "Excuse me.",
    "StartTime": 6.2901,
    "EndTime": 8.2357,
    "Speaker": "Female",
    "Emotion": "neutral",
    "SessionID": "Ses01"
}

Processing the IEMOCAP raw data happens here: multimodal-datasets/utils/extract_raw_data_iemocap.py

Let me know if you have more questions!

XinyeDu1204 commented 2 years ago

Thank you very much!