smatch++ scores - Githubissues

flipz357 commented 4 months ago

Hi @xiulinyang

If I remember right, there was a table in your README where you could see some interesting score differences between Smatch and Smatch++ that I think are due to the optimality of Smatch++. Is there any write-up where this is reported? I want to mention in a thesis summary some projects that have used Smatch++.

xiulinyang commented 4 months ago

Hi! Yes, I used SMATCH++ in my thesis and now I'm wrapping up the thesis into a conference paper (they should be ready within 3-4 weeks). If that's too-long waiting, you can check out my thesis Thesis_XiulinYang.pdf (the SMATCH++ results are reported on page 51).

Thanks for proposing this nice metric!

flipz357 commented 4 months ago

Thanks @xiulinyang, this will do just fine!

Maybe for some details, if they can be interesting to you: From my view, if Smatch++ gives a higher score than Smatch, this is due to Smatch being wrong and the hill-climber not finding optimal solutions. If Smatch++ gives a lower score than Smatch this can be due to two details.

Smatch decides to give the two graphs (d / dog) and (c / cat) a score of 0.5, since it aligns the root anonymously. Smatch++ decides to give the two graphs a score of 0.0 since the root gets only matched if the concept label is the same.
Another reason why Smatch can be higher than Smatch++ is because it handles duplicate triples not in a proper fashion. I wrote a blog post on this issue, basically the issue is also a bit funny because it means that you can hack an evaluation and always achieve any score that you want. Smatch++ does, as you very correctly write, remove duplicate triples if the input is an AMR graph. But it can also handle duplicate triples a the proper fashion without score inflation, if this is what is wished for.

PS: Your thesis looks very nice, congratulations!

xiulinyang / compositional_drs_parsing

smatch++ scores #1