shenwzh3 / DAG-ERC

Pytorch code for ACL-IJCNLP accepted paper "Directed Acyclic Graph Network for Conversational Emotion Recognition"
Apache License 2.0
80 stars 17 forks source link

How do you extract roberta feature? #8

Closed sailist closed 2 years ago

sailist commented 2 years ago

I can easily achieve your results by using your extracted feature. The results improve very quickly and significantly. Here is the statistics result after first epoch training.

              precision    recall  f1-score   support

           0     0.7976    0.7371    0.7661       620
           1     0.7296    0.5364    0.6183      1167
           2     0.5188    0.7685    0.6194      1149
           3     0.4415    0.6996    0.5414       739
           4     0.7868    0.2730    0.4053       392
           5     0.6775    0.3221    0.4366       711

    accuracy                         0.5900      4778
   macro avg     0.6586    0.5561    0.5645      4778
weighted avg     0.6401    0.5900    0.5813      4778

You explained how to extract the text feature in your paper:

Following Ghosal et al. (2020), we employ RoBERTa-Large (Liu et al., 2019), which has the same architecture as BERT-Large (Devlin et al., 2018), as our feature extractor. More specifi- cally, for each utterance ui, we prepend a special token [CLS] to its tokens, making the input a form of{[CLS],wi1,wi2,...,wini}. Then,weusethe [CLS]’s pooled embedding at the last layer as the feature representation of ui.

But when I try to extract this feature by myself, problem happend.

I simplly use huggingface's transformer lib to extract roberta text feature, so that the code is simple and easy to adapt:

import pickle
import json
from transformers.models.roberta import RobertaModel, RobertaTokenizer
from tqdm import tqdm
import torch
with torch.no_grad():
    for fn in ['dev_data_roberta.json.feature','train_data_roberta.json.feature','test_data_roberta.json.feature']:
        with open(fn) as r:
            data = json.load(r)
            for sample in tqdm(data):
                for utt in (sample):
                    ipt = tokenizer(utt['text'],return_tensors='pt').to(0)
                    opt = model(**ipt)
                    utt['cls_v2'] = opt.last_hidden_state[0,0].tolist()
                    utt['cls_v3'] = opt.pooler_output[0].tolist()

        with open(f"{fn}.v2", 'w') as w:
            json.dump(data,w)
tokenizer = RobertaTokenizer.from_pretrained('roberta-large')

model = RobertaModel.from_pretrained('roberta-large')  # type: RobertaModel
model.eval();
model.to(0);

Then I change dataset.py to adapted new feature, then execute run.py file as the same.

But this time, I can't achieve the result. After the first epoch training, I got this:

## v2
              precision    recall  f1-score   support

           0     0.1071    0.0194    0.0328       620
           1     0.1950    0.0934    0.1263      1167
           2     0.2303    0.7589    0.3534      1149
           3     0.2345    0.0460    0.0769       739
           4     0.0909    0.0077    0.0141       392
           5     0.1399    0.0281    0.0468       711

    accuracy                         0.2198      4778
   macro avg     0.1663    0.1589    0.1084      4778
weighted avg     0.1815    0.2198    0.1401      4778

## v3
              precision    recall  f1-score   support

           0     0.0943    0.0081    0.0149       620
           1     0.2352    0.2614    0.2476      1167
           2     0.2169    0.5013    0.3028      1149
           3     0.2251    0.2355    0.2302       739
           4     0.0000    0.0000    0.0000       392
           5     0.0000    0.0000    0.0000       711

    accuracy                         0.2219      4778
   macro avg     0.1286    0.1677    0.1326      4778
weighted avg     0.1567    0.2219    0.1708      4778

And no significant difference between v2(last_hidden_state) and v3(pooler_output).

Obviously, there is a huge gap between these two features, . I could have continued training to get the final result, but I think that's enough.

My question is, would you mind share your code for feature extraction? Have you finetuned Roberta on other datasets?

sailist commented 2 years ago

Sorry for raising this issue, I have found the answer in the paper your site.

qftie commented 2 years ago

As the COSMIC article describes, I can understand the features are fine-tuned previously on different datasets.

But the IEMOCAP dataset seems to have less effective labels than they describe in the article, since the article says it has 5810 utterances for train and validation and 1623 for the test, but actually, there are only 5758 for train and validation (4778 for train and 980 for valid) and 1622 utterances for the test. Do you know why this is happening? ps. What the counted utterances are those if 'label' in u.keys():, and 4778 is the same as the number you print out in the training log classification report of IEMOCAP training set.

sailist commented 2 years ago

I just started researching this area, but it seems like a common occurrence that different paper use different feature extractor methods, different sample numbers(eg, "Missing Modality Imagination Network for Emotion Recognition with Uncertain Missing Modalities" with 5531), then all papers will demonstrate they are sota.

My solution to handle this differences is to use IEMOCAP features provided by cogmen, and MELD features provided by mmgcn, then align all the code input form to have a fair comparision.

Coding511 commented 2 years ago

@sailist From where did you get the feature file? As the download link in the readme is not working. Thanks

sailist commented 2 years ago

@sailist From where did you get the feature file? As the download link in the readme is not working. Thanks

I just started researching this area, but it seems like a common occurrence that different paper use different feature extractor methods, different sample numbers(eg, "Missing Modality Imagination Network for Emotion Recognition with Uncertain Missing Modalities" with 5531), then all papers will demonstrate they are sota.

My solution to handle this differences is to use IEMOCAP features provided by cogmen, and MELD features provided by mmgcn, then align all the code input form to have a fair comparision.

Coding511 commented 2 years ago

@sailist Okay, so you are saying those features are compatible with this code? I tried using the same features here also in directory ........../data/IEMOCAP/speaker_vocab.pkl Ofcourse by renaming that file name from IEMOCAP_features_4.pkl

But after executing run.py it says

FileNotFoundError: [Errno 2] No such file or directory: '../data/IEMOCAP/speaker_vocab.pkl'

can you guide please.

sailist commented 2 years ago

You should adapt features and code by your own. You may refer to my template https://github.com/pytorch-lumo/MMERC

wep000 commented 1 year ago

@sailist I met the same problem as you. As described in the paper, I used the result of pooling the last hidden layer of pre-trained roberta as the feature, but the effect was not good, so I used sentence-transfomer to extract the feature. I estimate that the fine-tuned roberta is used in this paper, but my GPU does not support fine-tuning Roberta's model, so I am looking for whether there is a fine-tuned Roberta model that can be used directly.