jgroschwitz / GrAPES

GNU General Public License v3.0
1 stars 0 forks source link

Standardization of AMRs #3

Open flipz357 opened 6 months ago

flipz357 commented 6 months ago

For parsing evaluation it is good to standardize AMRs.

E.g., some parsers use ' instead of ", others do lower/uppercasing differently. These things should have no effect on the score.

I'm not sure where/if GRAPES already does some of those standardization, at least for the structural matching I couldn't find any. There are also advanced methods like de/reification standardization, or domain=mod-of. For instance, Smatchpp performs this list of AMR-guideline-informed standardizations to AMR. You could copy them as needed, or they can be simply applied, "out-of-the-box":

from smatchpp import preprocess
standardizer = preprocess.AMRStandardizer()

So that the standardization can be applied in a single line of pre-processing.

jgroschwitz commented 6 months ago

Good tip, thanks! We tried to avoid graphs that feature such ambiguous graph aspects in our structural generalization tasks (but probably didn't succeed fully -- the "multiple adjectives" category certainly has some mod edges), and we didn't notice any issues with this when looking at example errors. But it's really cool that you have an out-of-the-box standardization function, super useful, definitely better than what we've been doing so far! Will include this in the next revision of the code.