snowblink14 / smatch

Smatch tool: evaluation of AMR semantic structures
MIT License
63 stars 27 forks source link

TOP problem #36

Closed rafaelanchieta closed 4 years ago

rafaelanchieta commented 4 years ago

Considering TOP as an attribute as suggested by @oepen in #25, Smatch is returning high scores for completely different nodes.

$ cat x
(x / x)

$ cat y
(y / y)

python smatch.py --pr -f x y 
Precision: 0.5
Recall: 0.5
F-score: 0.5
oepen commented 4 years ago

well, one might quarrel whether the two nodes are 'completely different'. i find it hard to have linguistic intuitions about this example, but formally i see two graphs that are somewhat similar: they are both comprised of exactly one node and have no edges. the only difference between the two graphs is that the label for the node differs, so i am on board with SMATCH here: of the observable information in the two graphs, half of it is shared :-).

rafaelanchieta commented 4 years ago

I agree that they are structurally similar. However, the minimal Smatch score will be 0.5.

$ cat x
(d \ die-01)

$ cat y
(r \ run-01)

python smatch.py --pr - f x y
Precision: 0.5
Recall: 0.5
F-score: 0.5
goodmami commented 4 years ago

the minimal Smatch score will be 0.5.

Only if both sides have only 2 triples (TOP and instance) as in this example. Increase the number of triples and the score can go lower.

$ cat a
(d / die-01
   :ARG0 (p / process))
$ cat b
(r / run-01)
$ python3 smatch.py --pr -f a b 
Precision: 0.25
Recall: 0.50
F-score: 0.33
rafaelanchieta commented 4 years ago

You are right. The score may decrease. Is that the expected behaviour of Smatch?

goodmami commented 4 years ago

Is that the expected behaviour of Smatch?

Yes. The more different the graphs, the lower the score.

rafaelanchieta commented 4 years ago

Ok. Thanks.