python evaluate_single_dataset.py dev.json dev.json does not result in 100% F1

janpf commented 2 years ago

Hi, I'm a bit unsure whether I'm misunderstanding the tool, but I believe using this script to compare a file to itself should always result in 100% F1 score: https://github.com/jerbarnes/semeval22_structured_sentiment/blob/master/evaluation/evaluate_single_dataset.py but e.g. on both darmstadt splits this is not the case:

$ python evaluate_single_dataset.py data/darmstadt_unis/dev.json data/darmstadt_unis/dev.json
Sentiment Tuple F1: 0.942
$ python evaluate_single_dataset.py data/darmstadt_unis/train.json data/darmstadt_unis/train.json
Sentiment Tuple F1: 0.964

am I using the tool wrong? Thanks!

jerbarnes commented 2 years ago

Hi,

No you're correct. After looking into it, it's not actually a problem with the evaluation script, but again with the data. There were some examples where the polar expression was incorrectly annotated starting from the inside of one of the tokens. This led to errors when parsing them in the evaluation script that gave empty polar expressions. The main assumption the evaluation script makes is that there can be no empty polar expressions, so this was a problem.

I've updated the processing scripts for both MPQA and Darmstadt to remove these problems, so it will be necessary to reprocess them.

janpf commented 2 years ago

Hi Jeremy, thank you for your quick reply and quick fix!

jerbarnes / semeval22_structured_sentiment

python evaluate_single_dataset.py dev.json dev.json does not result in 100% F1 #17