jerbarnes / semeval22_structured_sentiment

SemEval-2022 Shared Task 10: Structured Sentiment Analysis
75 stars 42 forks source link

python evaluate_single_dataset.py dev.json dev.json does not result in 100% F1 #17

Closed janpf closed 2 years ago

janpf commented 2 years ago

Hi, I'm a bit unsure whether I'm misunderstanding the tool, but I believe using this script to compare a file to itself should always result in 100% F1 score: https://github.com/jerbarnes/semeval22_structured_sentiment/blob/master/evaluation/evaluate_single_dataset.py but e.g. on both darmstadt splits this is not the case:

$ python evaluate_single_dataset.py data/darmstadt_unis/dev.json data/darmstadt_unis/dev.json
Sentiment Tuple F1: 0.942
$ python evaluate_single_dataset.py data/darmstadt_unis/train.json data/darmstadt_unis/train.json
Sentiment Tuple F1: 0.964

am I using the tool wrong? Thanks!

jerbarnes commented 2 years ago

Hi,

No you're correct. After looking into it, it's not actually a problem with the evaluation script, but again with the data. There were some examples where the polar expression was incorrectly annotated starting from the inside of one of the tokens. This led to errors when parsing them in the evaluation script that gave empty polar expressions. The main assumption the evaluation script makes is that there can be no empty polar expressions, so this was a problem.

I've updated the processing scripts for both MPQA and Darmstadt to remove these problems, so it will be necessary to reprocess them.

janpf commented 2 years ago

Hi Jeremy, thank you for your quick reply and quick fix!