is it expected that the first word of a mention contains an identical CorefMention twice in the node.coref_mentions? To show what I mean, the first two words from English-GUM
data = udapi.block.read.conllu.Conllu(files=..., split_docs=True).read_documents()
for doc in data:
for node in doc.nodes_and_empty:
print(node.coref_mentions)
returns for example
[<udapi.core.coref.CorefMention object at 0x14dea7b8a590>, <udapi.core.coref.CorefMention object at 0x14dea7b8a590>]
[<udapi.core.coref.CorefMention object at 0x14dea7b8a590>]
Hi,
is it expected that the first word of a mention contains an identical
CorefMention
twice in thenode.coref_mentions
? To show what I mean, the first two words from English-GUMprocessed by the code
returns for example
It is cased by the
CorefMention
being added both during construction and via explicitnode._mentions.append
in the lines https://github.com/udapi/udapi-python/blob/f3b8689bffdccd0cf608423b8f50deaee0419207/udapi/core/coref.py#L645-L650Maybe this is expected, but I was surprised by it and could not find any mention about it in the docs.
If it is not expected, an obvious fix is to pass
add_word_backlinks=False
to the mentionedCorefMention
constructor call.Thanks & cheers!