umr4nlp / umr-guidelines

9 stars 6 forks source link

Confusion, Questions, and Requests related to UMR Notations #6

Open ablodge opened 2 years ago

ablodge commented 2 years ago

Good Morning. I realize the UMR notation encodes a lot of information incorporating many areas of research. I found some parts of the notation confusing, and I thought it might be useful to identify potential sources of confusion and ask questions and offer some feedback. Thank you in advance for your replies.

Capitalization and Naming Conventions

I noticed some inconsistencies in the guidelines related to capitalization and naming of concepts, relations, and attribute values:

With that in mind I would request:

Abbreviations and Acronyms

Some of UMR’s notations rely on acronyms and difficult-to-read abbreviations such as DCT, AUTH, PrtAff, and modstr. AMR is designed to be human-readable, which is important for its use as an explanatory tool, and I believe it also reduces the learning curve for reading and annotating AMR. I would also stress that these guidelines won’t just be used by linguists, but also computer scientists who want to be able to read or parse UMR.

With that in mind I would request:

Transliteration

You have added a nice notation for transliterating words such as for annotating low-resource languages:

(e / enhleama-00  'travel' 
     …

I like this notation for transliteration, and I think it will be very useful for annotating and using multilingual UMR data. However, please be aware that this notation changes the AMR data structure and many code libraries for reading, writing, and representing the AMR data will need to be updated to even be able to run on UMR inputs with this notation. For example, Smatch and penman will fail to run if you try to run their current code on a UMR with transliteration. A possible workaround could be to represent transliteration as attributes, e.g. e / enhleama-00 :transl "travel", at least until this notation is supported in libraries that currently work on AMR (I think you could do this in a post-processing script rather than changing the notation).

Questions/Requests:

Document-Level Representation

Similarly with transliteration, the notation for document-level representations that is show in the guidelines will not be supported by current code for AMRs because a notation like :temporal((DCT :depends-on s1t2) (s1t2 :contained s1t)) isn't supported. If these representations are always connected graphs, it might be good to make them conform to AMR notation.

Questions/Requests:

Temporal Relations

I found the notation of :after and :before confusing at first. I read the relation A :after B as “A happens after B happens”, but according to the guidelines, it is the other way around. I think it’s easier in English to read it as “A happens after B happens” and other people might be confused by this as well.

Questions/Requests: