snowblink14 / smatch

Smatch tool: evaluation of AMR semantic structures
MIT License
63 stars 25 forks source link

Bug(?) in string parsing #45

Open flipz357 opened 5 months ago

flipz357 commented 5 months ago

I don't need this fixed, but maybe it's of interest:

I noticed that this smatch version returns a score of 1.00 for the different graphs:

(s / see-01
   :ARG0 (p / person
         :name (n / name
              :op1 "Hans")))

and

(s / see-01
   :ARG0 (p / person
         :name (n / name
              :op1 "Hans_")))

Wondering if there's a bug, or if there is some reason for this? It might be some sort of preprocessing that happens here which is not obvious. Since there are also https link etc. in AMR that may contain stuff like "_" I think it may not be sensible to remove characters.

edit

It can be even more severe. The score is also 1.00 for the very different graphs:

(s / see-01
   :ARG0 (p / person
         :name (n / name
              :op1 "Hans Meier")))

and

(s / see-01
   :ARG0 (p / person
         :name (n / name
              :op1 "Hans")))

While the first two graphs could be some pre-processing quirk, the second two ones clearly seem like bug.

snowblink14 commented 5 months ago

Thanks for reporting this, @flipz357. I think this old piece of code has bug handling blank space and underscores when processing the values.

Should be an easy fix though -- I'll fix this soon.

jzw2 commented 2 weeks ago

Is this intended behaviour? I noticed the given test_input1 and test_input2 have a similar example, with "Wiilliam" differeing in capitalization and having underscores. It currently counts it as a match.

flipz357 commented 2 weeks ago

Hi @jzw2 I guess from the view of AMR evaluation/similarity it can seem okay to not differ between upper- and lower-case and be considered a user choice (for very strict evaluation, we might want to differ though). The observed treatment of underscores (the bug you experience, or the example shown above), however, seems indeed like wrong/unintended behavior and should not count as a match.

Maybe you can check out my SMATCH++ that has a best-practice evaluation protocol with AMR normalization (and ILP solver).