Closed sailist closed 2 years ago
Sorry for raising this issue, I have found the answer in the paper your site.
As the COSMIC article describes, I can understand the features are fine-tuned previously on different datasets.
But the IEMOCAP dataset seems to have less effective labels than they describe in the article, since the article says it has 5810 utterances for train and validation and 1623 for the test, but actually, there are only 5758 for train and validation (4778 for train and 980 for valid) and 1622 utterances for the test. Do you know why this is happening?
ps. What the counted utterances are those if 'label' in u.keys():
, and 4778 is the same as the number you print out in the training log classification report of IEMOCAP training set.
I just started researching this area, but it seems like a common occurrence that different paper use different feature extractor methods, different sample numbers(eg, "Missing Modality Imagination Network for Emotion Recognition with Uncertain Missing Modalities" with 5531), then all papers will demonstrate they are sota.
My solution to handle this differences is to use IEMOCAP features provided by cogmen, and MELD features provided by mmgcn, then align all the code input form to have a fair comparision.
@sailist From where did you get the feature file? As the download link in the readme is not working. Thanks
@sailist From where did you get the feature file? As the download link in the readme is not working. Thanks
I just started researching this area, but it seems like a common occurrence that different paper use different feature extractor methods, different sample numbers(eg, "Missing Modality Imagination Network for Emotion Recognition with Uncertain Missing Modalities" with 5531), then all papers will demonstrate they are sota.
My solution to handle this differences is to use IEMOCAP features provided by cogmen, and MELD features provided by mmgcn, then align all the code input form to have a fair comparision.
@sailist Okay, so you are saying those features are compatible with this code? I tried using the same features here also in directory ........../data/IEMOCAP/speaker_vocab.pkl Ofcourse by renaming that file name from IEMOCAP_features_4.pkl
But after executing run.py it says
FileNotFoundError: [Errno 2] No such file or directory: '../data/IEMOCAP/speaker_vocab.pkl'
can you guide please.
You should adapt features and code by your own. You may refer to my template https://github.com/pytorch-lumo/MMERC
@sailist I met the same problem as you. As described in the paper, I used the result of pooling the last hidden layer of pre-trained roberta as the feature, but the effect was not good, so I used sentence-transfomer to extract the feature. I estimate that the fine-tuned roberta is used in this paper, but my GPU does not support fine-tuning Roberta's model, so I am looking for whether there is a fine-tuned Roberta model that can be used directly.
I can easily achieve your results by using your extracted feature. The results improve very quickly and significantly. Here is the statistics result after first epoch training.
You explained how to extract the text feature in your paper:
But when I try to extract this feature by myself, problem happend.
I simplly use huggingface's transformer lib to extract roberta text feature, so that the code is simple and easy to adapt:
Then I change
dataset.py
to adapted new feature, then execute run.py file as the same.But this time, I can't achieve the result. After the first epoch training, I got this:
And no significant difference between v2(last_hidden_state) and v3(pooler_output).
Obviously, there is a huge gap between these two features, . I could have continued training to get the final result, but I think that's enough.
My question is, would you mind share your code for feature extraction? Have you finetuned Roberta on other datasets?