jgroschwitz / GrAPES

GNU General Public License v3.0
1 stars 0 forks source link

Ensure safety of Smatch results #2

Open flipz357 opened 11 months ago

flipz357 commented 11 months ago

Great work, very happy to see this!

I don't know the extent that Grapes relies on Smatch, but it seems pretty significant. So to ensure safety of results, smatch should be replaced by smatchpp with ILP solver.

The Smatch that you seem to use employs a heuristic that has no upper-bound. This means that Smatch scores cannot be verified and are in some cases even false.

jgroschwitz commented 11 months ago

Hi Juri, Thanks for the kind words, and for the advice!

Actually, the metrics we report in the paper don't rely on smatch, they are all either exact match or custom metrics. But the code in this repo also reports smatch as a supplementary metric; we'll make sure to update that.

flipz357 commented 11 months ago

Neat, just a follow up Q: without Smatch, how would you calculate and verify exact match in the structural generalization? I'd say that exact match is simply exactly where (optimal) Smatch F1 is 100 (Precision=100 for subgraph isomorphism).

jgroschwitz commented 11 months ago

We do a graph traversal and simply keep track of all possible alignments that exactly match. In theory, this has exponential runtime, but because most node names don't occur many times in the graph, in practice the number of possible alignments to track is very small. The code is here if you're interested: https://github.com/jgroschwitz/GrAPES/blob/main/evaluation/graph_matcher.py (we technically check whether G1 contains G2, and G2 contains G1, to check if G1 exactly matches G2).

Since one of the graphs we compare is parser output, it might be possible that an unfortunate graph prediction could lead to unfeasibly long runtimes. We avoided doing the 100 Smatch solution exactly because Smatch doesn't guarantee that score even when the graphs are identical. Does smatchpp? Does it have runtime guarantees too?

flipz357 commented 11 months ago

Interesting, I have to think about it if your approach is verifiably the correct solution.

When Smatchpp checks the problem, you have (i.e., answering your questions):

Optimal solution: yes (so yes it guarantee score) Runtime guarantee: no (since isomorphism is NP complete problem)

However, the cool thing is that if you're just interested in exact match, you can use lossless graph compression with Smatchpp (see here for python example) this means that Smatchpp will give you optimal exact match very fast.

flipz357 commented 11 months ago

I think your approach can yield the correct exact match solution in theory and it should be equivalent to Smatchpp. Maybe it can be tested against Smatchpp precision to verify this. Smatchpp with lossless graph compression might be still faster though. If hacking deep into the code, and only interested in exact match, one could even abort as soon as the upper-bound gets below 100.

jgroschwitz commented 11 months ago

Neat, I'll check it out!

flipz357 commented 11 months ago

Cool! Btw., since the subgraph isomorphism is an interesting example case for SMATCH++, I added a code example to the readme, and upgraded the version to 1.5.2.

I also tested the time on your example in graph_matcher.py. Both are quite fast, but it takes longer than your code. I think this may be due to the additional pre-/post-processing/standardizing but also because the example does some additional computing since it calculates the isomorphism test based on the completed result of the structural matching, which is theoretically not necessary.

Note also that according to AMR-guidelines (a / b :mod (c / d)) is a subgraph of (x / y :rel (c / d :domain (a / b))), so if this should be accounted for then GenericStandardizer would need to be replaced with AMRStandardizer in the example.

jgroschwitz commented 11 months ago

Oh cool, thanks for checking this! Yes it's probably because our code is much more specialized and simpler. Good to hear though that it is efficient too :)

Thanks for linking the example and clarifying the standardization too! Will be useful :)