Closed janpf closed 3 years ago
Hi,
No you're correct. After looking into it, it's not actually a problem with the evaluation script, but again with the data. There were some examples where the polar expression was incorrectly annotated starting from the inside of one of the tokens. This led to errors when parsing them in the evaluation script that gave empty polar expressions. The main assumption the evaluation script makes is that there can be no empty polar expressions, so this was a problem.
I've updated the processing scripts for both MPQA and Darmstadt to remove these problems, so it will be necessary to reprocess them.
Hi Jeremy, thank you for your quick reply and quick fix!
Hi, I'm a bit unsure whether I'm misunderstanding the tool, but I believe using this script to compare a file to itself should always result in 100% F1 score: https://github.com/jerbarnes/semeval22_structured_sentiment/blob/master/evaluation/evaluate_single_dataset.py but e.g. on both darmstadt splits this is not the case:
am I using the tool wrong? Thanks!