snowblink14 / smatch

Smatch tool: evaluation of AMR semantic structures
MIT License
63 stars 27 forks source link

smatch ignores triples whose source is a constant #26

Open goodmami opened 4 years ago

goodmami commented 4 years ago

This issue is spun off of #10 as it's really a different issue despite also being about inverted roles.

Smatch compares triples which are deinverted, so (a ... :ARG0-of (b ... becomes the triple (ARG0, b, a) (in smatch's (role, source, target) order)). If this deinversion results in a constant becoming the source, then the triple is not considered by smatch, resulting in inflated scores. While these triples may not be considered valid AMR, they are nevertheless observed in the outputs of automatic systems and should be considered during evaluation.

Consider the following hypothetical AMRs as gold and as the outputs of three different systems. The triples that smatch computes (including the top triple) are in comments.

$ cat gold  # original:     (TOP, a, alpha) (instance, a, alpha) (ARG0, a, 1)
(a / alpha
   :ARG0 1)
$ cat a  # wrong value:     (TOP, a, alpha) (instance, a, alpha) (ARG0, a, 2)
(a / alpha
   :ARG0 2)
$ cat b  # missing edge:    (TOP, a, alpha) (instance, a, alpha)
(a / alpha)
$ cat c  # wrong direction: (TOP, a, alpha) (instance, a, alpha) (ARG0, 1, a)
(a / alpha
   :ARG0-of 1)

Now we compare these to the gold. The raw counts of matching triples used to compute precision and recall are in comments.

$ python smatch.py --pr -f a gold  # P=2/3  R=2/3
Precision: 0.67
Recall: 0.67
F-score: 0.67
$ python smatch.py --pr -f b gold  # P=2/2  R=2/3
Precision: 1.00
Recall: 0.67
F-score: 0.80
$ python smatch.py --pr -f c gold  # P=2/2  R=2/3  (Note: P != 2/3)
Precision: 1.00
Recall: 0.67
F-score: 0.80

Smatch considers three types of triples: instances (e.g., (instance, s, see-01)), attributes (e.g., (polarity, a, -), and edges (e.g., (ARG0, s, b)). In all cases, the source is the variable of some node. When the source is a constant, it doesn't fit into these three categories.

The straightforward fix is to ensure that inverted triples whose sources are constants (such as the (ARG0, 1, a) triple of c) are counted in the denominators for P and R, perhaps grouped in some "extra triples" category if necessary. When the role is :domain-of or :mod-of, however, there may be different behavior (see #10), but these can perhaps be resolved when decoding the AMR and not during the smatch computation.

oepen commented 4 years ago

the scores for your examples A and B seem correct, while C should have a precision of 2/3, right?

if de-inversion generally was restricted to genuine edges (involving two nodes, not constants), the :ARG0-of in C would yield an ‘attribute’ triple on node a, making it scores just the same as A.

problem solved, without inventing a new type of triples?

goodmami commented 4 years ago

the scores for your examples A and B seem correct, while C should have a precision of 2/3, right?

Yes, sorry if that was not clear. The scores I show are the current state of smatch, and not what it should be.

if de-inversion generally was restricted to genuine edges (involving two nodes, not constants), the :ARG0-of in C would yield an ‘attribute’ triple on node a, making it scores just the same as A.

problem solved, without inventing a new type of triples?

Ok, so you're proposing to extend the conditional deinversion as we've discussed for :mod to all roles? That also works, but then (as with #10) we need to delay deinversion until after reading the full graph so we know what are true nodes, and I don't know amr.py well enough to implement it. However I've already implemented it (yesterday) in my Penman library, if we can agree to use that for parsing (which would also solve #10, #11, and #17).

nschneid commented 4 years ago
(a / alpha
   :ARG0-of 1)

In annotation this would be considered illegal. Does Smatch reject AMRs that are valid Penman but with parts that don't fit together according to the rules of AMR? If so I vote for rejecting this input. Otherwise I guess the rule should be "don't invert any relation whose target is a constant".

goodmami commented 4 years ago

In annotation this would be considered illegal.

Sorry I tried to make it clear in the issue text: this is not valid AMR but it (or something like it) is output by AMR parsers. For instance, I see :part-of "Sahara" in the output of AMR-EAGER and :location-of 4 in the output of JAMR, among others.

Does Smatch reject AMRs that are valid Penman but with parts that don't fit together according to the rules of AMR?

It depends on what you consider valid PENMAN. For instance, amr.py rejects missing concepts (a :ARG0 b) but not if the / is there ((a / :ARG0 b)), and it rejects distributed node definitions ((a / alpha :ARG0 (b / beta) :ARG1 (b / :ARG0 (g / gamma)))). But in general amr.py and smatch.py are naive concerning AMR's rules: no concept inventory or propbank frames, no role inventory, no reification, no canonical role deinversions, etc. I do not think all of these features should be added, however, as it would complicate the metric.

If so I vote for rejecting this input. Otherwise I guess the rule should be "don't invert any relation whose target is a constant".

The "inverted attributes" are bad triples but smatch does not count them against the final score, so I proposed including them in the score. @nschneid, are you proposing to discard these AMRs entirely but continue on with the next pair, or to abort the smatch process? I worry that doing the former will also lead to inflated scores over a corpus as the bad graphs are simply ignored. The latter can be frustrating for users who don't have control over the well-formedness of the inputs. But certainly we can log a warning or error so users will be aware of the issue.

nschneid commented 4 years ago

But in general amr.py and smatch.py are naive concerning AMR's rules

Given this, then rejecting the input seems like it could cause problems. So let's just not invert these relations involving constants.