Closed ianporada closed 1 year ago
I cannot find a way. It seems like this would need to be added as an additional root comment (e.g. $SPEAKER
). Or more generally maybe it makes sense to store all sentence-level comments with the root as a string in case there are others of interest too.
Some CoNLL-U comments are standardized and exposed in Udapi API, e.g. root.sent_id
, root.text
, root.newpar
or root.newdoc
. The remaining comments are stored in root.comment
, which is a (possibly multi-line) string corresponding to all the comment lines, but excluding the #
characters. So to extract the speaker you need to use something like
speaker = None
match = re.search("^ speaker = (.+)", tree.comment, re.M)
if match:
speaker = match.group(1)
I see, thanks! I was confused by the fact that standardized comments are replaced by tags in the comment attribute, but understand now. https://github.com/udapi/udapi-python/blob/a9050283fe1530e9f14dcbe5ffc10e64b2f85eae/udapi/block/read/conllu.py#LL42C29-L42C41
Sometimes a CorefUD 1.1 document has speaker information as a sentence-level comment below the
sent_id
, example below. Is there a way to recover this information from a Document?