A question about comparison experiment, why didn't you compare with CESTa on dailydialog dataset?

rungjoo / CoMPM

Context Modeling with Speaker's Pre-trained Memory Tracking for Emotion Recognition in Conversation (NAACL 2022)

62 stars 14 forks source link

A question about comparison experiment, why didn't you compare with CESTa on dailydialog dataset? #4

Closed qftie closed 1 year ago

qftie commented 1 year ago

Contextualized Emotion Recognition in Conversation as Sequence Tagging seems to be one influential article in ERC area, I wonder that why didn't you compare with CESTa on dailydialog dataset? Is there any specific reason?

rungjoo commented 1 year ago

No. There is no particular reason. When I wrote the paper, I cited mainly frameworks that were tested with the same evaluation metric in four benchmarks. CESTa is also considered a good paper, but it was published in 2020. We cite more recent papers.

qftie commented 1 year ago

No offense, I just find it hard to reach cesta's micro performance of 63.12 on dailydialog, which is 3 points higher than compm, one of the most advanced models nowadays, so I would like to ask if there is a reason why this is not comparable due to different experimental settings or different metrics settings. I was wondering if perhaps they only considered micro as a performance criterion and your experiments took into account both micro and macro scores, and if this could be the basis for not using the data reported in their paper for comparison.

rungjoo commented 1 year ago

CESTa was measured only with micro metrics in dailyDialog, and was not tested in EmoryNLP. However, since CESTa shows good results in dailydialog, it is reasonable to select it as a comparative paper.

---caution--- I'm not fully aware of CESTa , but from a quick glance it seems that future utterances are taken into account as input. (In Introduction, Figure 2 seems to use the future feature.) That is, if future utterances are considered input, they are not suitable as comparative papers. There are often studies that improve emotion recognition performance by using future utterances. This is because typical ERC studies only use utterances before the current turn as context.