Closed khannan-livefront closed 3 months ago
If I do this, I get all three Word
s, so I think it is behaving as expected. The UD annotation standard is to mark the start and end points (inclusive). Is there something else you observed that needs fixed?
pipe("I dunno where it went").sentences[0].tokens[1]
[
{
"id": [
2,
4
],
"text": "dunno",
"start_char": 2,
"end_char": 7,
"ner": "O",
"multi_ner": [
"O"
]
},
{
"id": 2,
"text": "du",
"lemma": "do",
"upos": "AUX",
"xpos": "VBP",
"feats": "Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin",
"head": 0,
"deprel": "root",
"start_char": 2,
"end_char": 4
},
{
"id": 3,
"text": "n",
"lemma": "not",
"upos": "PART",
"xpos": "RB",
"head": 2,
"deprel": "advmod",
"start_char": 4,
"end_char": 5
},
{
"id": 4,
"text": "no",
"lemma": "no",
"upos": "INTJ",
"xpos": "UH",
"head": 2,
"deprel": "discourse",
"start_char": 5,
"end_char": 7
}
]
Thanks for clarifying @AngledLuffa. It's fine if it works that way. I had coded my implementation to assume every id
would be linked to by the multi-word token, but it appears that this assumption is wrong. I've updated my implementation to treat the id
s of the multiword token as a minmax range.
Thanks for the prompt response!
Describe the bug With the introduction of multi-word tokens (MWT) for english, we came across a test case where the tokens of a multi-word token are not linked correctly to associated token ids.
To Reproduce Steps to reproduce the behavior:
dunno
reveal that one of the tokens for that word is not linked to by its multi-word token:Expected behavior The MWT token links to all of the children tokens it encompasses. id:
[2, 3, 4]
Environment (please complete the following information):
dev
branch up to commit b62c1e7f8e0e17eAdditional context I'm not sure if this behaviour is intended or not. Are the IDs of the MWT token intended to act as a tuple, i.e. a range, or should they include every token that's a member of the multi-word token? If it's the latter then I believe this is a bug.