Closed MinionAttack closed 3 years ago
Hi Iago,
The values look pretty normal, except for Darmstadt with the Graph Parsing approch. The OpeNER/Multibooked datasets are right where I would expect, Norec is always a bit lower because it's a more diverse dataset and MPQA is quite hard because of the ambiguity of many of the polar expressions and size of the holders and targets. But Darmstadt using the graph parser should be higher than MPQA.
Thanks for the clarification, the issue could be because of this change? https://github.com/jerbarnes/semeval22_structured_sentiment/issues/9
I doubt it. These issues aren't so common as to cause a large drop in performance, and the original paper where we take the baseline from (https://aclanthology.org/2021.acl-long.263/) used the same data. Perhaps it's the effect of a particularly poor random seed, as there is a bit of variance (+-2.0 in the paper)??
Ah, ok, I'll try to train it again. Thank you.
Hi,
I am using the latest code available in the repo and have trained both baselines (graph parser and sequence labeling). Then I have measured the Sentiment Tuple F1 with the
dev.json
file and I am getting this values:For Graph Parsing, the values are around 0.5xx except for Darmstadt Unis, MPQA and Norec which are lower, specially Darmstadt Unis. For Sequence Labeling, the values are around 0.3xx except for Darmstadt Unis, MPQA and Norec which are lower, specially MPQA.
Are those values correct to be taken as a reference or I am doing something wrong training the models or when I do the inference to get the scores?
Regards.