Open flipz357 opened 10 months ago
Thank you @flipz357 for reporting this. The randomness of Smatch implementations has been documented on our forum for 4 years and finally, you brought the community a solid solution. Your paper is quite dense, and I'll spend some time reading it then integrating your implementation soon.
Thanks @hankcs , apologies for any density in the paper, there's a few issues of current state of amr evaluation. But I think using a hill-climber for evaluation may clearly be the biggest current issue, since any of the scores from hill-climber are only lower-bounds and thus not verifiable (there are no upper-bounds), so we can never know if an output of the hill-climber is wrong, or correct (except of course if it returns 100 since then trivially it holds upper bound = lower bound).
Describe the bug As also noted in the original Smatch repo issues, the Smatch score gives wrong and unverifiable results. This is also the case for HanLP.
Code to reproduce the issue
Describe the current behavior Totally wrong and random Smatch scores.
Expected behavior A deterministic Smatch score of 100
System information
Other info / logs Not necessary. The problem is simply because using a hill-climber for graph matching is unsafe and intransparent, and lacks any upper-bound on the solution. This gets worse when graphs get more large than before, but can also occur on smaller graphs. A more detailed empirical study of the problem can be found here.