jonorthwash / ud-annotatrix

GNU General Public License v3.0
63 stars 49 forks source link

Multiword token range sometimes being saved as HEAD #475

Open nschneid opened 2 years ago

nschneid commented 2 years ago

For words in a multiword token, when I export (download) a .conllu file, sometimes their dependents have the entire MWT in the HEAD column, e.g. 1-2 instead of 1. This breaks the viewer when I reopen the sentence.

kmurphy4 commented 2 years ago

Could you post an example sentence or screenshot that breaks like this?

nschneid commented 2 years ago

Now my setup is borked and I can't get the viewer to work for any sentence even with a new upload. :( But here is the sentence that was giving me grief:

1-2 Here's  _   _   _   _   _   _   _   _
1   Here    here    ADV RB  PronType=Dem    0   root    _   start_char=89|end_char=93
2   's  be  AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   1   cop _   start_char=93|end_char=95
3   the the DET DT  Definite=Def|PronType=Art   4   det _   start_char=96|end_char=99
4   paper   paper   NOUN    NN  Number=Sing 1   nsubj   _   start_char=100|end_char=105
5   that    that    SCONJ   WDT PronType=Rel    11  mark    _   start_char=106|end_char=110
6   people  people  NOUN    NNS Number=Plur 11  nsubj   _   start_char=111|end_char=117
7   who who PRON    WP  PronType=Rel    8   nsubj   _   start_char=118|end_char=121
8   read    read    VERB    VBP Mood=Ind|Tense=Pres|VerbForm=Fin    6   acl:relcl   _   start_char=122|end_char=126
9   it  it  PRON    PRP Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs  8   obj _   start_char=127|end_char=129
10  will    will    AUX MD  VerbForm=Fin    11  aux _   start_char=130|end_char=134
11  find    find    VERB    VB  VerbForm=Inf    4   acl:relcl   _   start_char=135|end_char=139
12  out out ADP RP  _   11  compound:prt    _   start_char=140|end_char=143
13  about   about   SCONJ   IN  _   17  mark    _   start_char=144|end_char=149
14  how how ADV WRB PronType=Int    17  advmod  _   start_char=150|end_char=153
15  resumptive  resumptive  ADJ JJ  Degree=Pos  16  amod    _   start_char=154|end_char=164
16  pronouns    pronoun NOUN    NNS Number=Plur 17  nsubj   _   start_char=165|end_char=173
17  help    help    VERB    VBP Mood=Ind|Tense=Pres|VerbForm=Fin    11  advcl   _   start_char=174|end_char=178
18  islands island  NOUN    NNS Number=Plur 17  obj _   start_char=179|end_char=186
19  go  go  VERB    VB  VerbForm=Inf    17  xcomp   _   start_char=187|end_char=189
20  down    down    ADP IN  _   19  compound:prt    _   start_char=190|end_char=194
21  a   a   DET DT  Definite=Ind|PronType=Art   22  det _   start_char=195|end_char=196
22  little  little  ADJ JJ  Degree=Pos  23  obl:npmod   _   start_char=197|end_char=203
23  easier  easier  ADJ JJR Degree=Cmp  19  advmod  _   start_char=204|end_char=210
24  .   .   PUNCT   .   _   1   punct   _   start_char=210|end_char=211

Somehow token 2 was becoming unattached and token 24 was showing up with 1-2 as its head.

nschneid commented 2 years ago

The viewer is not working anymore in Firefox but I'm able to view and download it in Chrome just fine. And it downloads the correct parse. Do I need to clear my Firefox cache or something?

nschneid commented 2 years ago

Do I need to clear my Firefox cache or something?

And it works in Firefox Private Browsing mode, so something got messed up in my browser session.

kmurphy4 commented 2 years ago

Now my setup is borked and I can't get the viewer to work for any sentence even with a new upload. :( But here is the sentence that was giving me grief:

1-2   Here's  _   _   _   _   _   _   _   _
1 Here    here    ADV RB  PronType=Dem    0   root    _   start_char=89|end_char=93
2 's  be  AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   1   cop _   start_char=93|end_char=95
3 the the DET DT  Definite=Def|PronType=Art   4   det _   start_char=96|end_char=99
4 paper   paper   NOUN    NN  Number=Sing 1   nsubj   _   start_char=100|end_char=105
5 that    that    SCONJ   WDT PronType=Rel    11  mark    _   start_char=106|end_char=110
6 people  people  NOUN    NNS Number=Plur 11  nsubj   _   start_char=111|end_char=117
7 who who PRON    WP  PronType=Rel    8   nsubj   _   start_char=118|end_char=121
8 read    read    VERB    VBP Mood=Ind|Tense=Pres|VerbForm=Fin    6   acl:relcl   _   start_char=122|end_char=126
9 it  it  PRON    PRP Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs  8   obj _   start_char=127|end_char=129
10    will    will    AUX MD  VerbForm=Fin    11  aux _   start_char=130|end_char=134
11    find    find    VERB    VB  VerbForm=Inf    4   acl:relcl   _   start_char=135|end_char=139
12    out out ADP RP  _   11  compound:prt    _   start_char=140|end_char=143
13    about   about   SCONJ   IN  _   17  mark    _   start_char=144|end_char=149
14    how how ADV WRB PronType=Int    17  advmod  _   start_char=150|end_char=153
15    resumptive  resumptive  ADJ JJ  Degree=Pos  16  amod    _   start_char=154|end_char=164
16    pronouns    pronoun NOUN    NNS Number=Plur 17  nsubj   _   start_char=165|end_char=173
17    help    help    VERB    VBP Mood=Ind|Tense=Pres|VerbForm=Fin    11  advcl   _   start_char=174|end_char=178
18    islands island  NOUN    NNS Number=Plur 17  obj _   start_char=179|end_char=186
19    go  go  VERB    VB  VerbForm=Inf    17  xcomp   _   start_char=187|end_char=189
20    down    down    ADP IN  _   19  compound:prt    _   start_char=190|end_char=194
21    a   a   DET DT  Definite=Ind|PronType=Art   22  det _   start_char=195|end_char=196
22    little  little  ADJ JJ  Degree=Pos  23  obl:npmod   _   start_char=197|end_char=203
23    easier  easier  ADJ JJR Degree=Cmp  19  advmod  _   start_char=204|end_char=210
24    .   .   PUNCT   .   _   1   punct   _   start_char=210|end_char=211

Somehow token 2 was becoming unattached and token 24 was showing up with 1-2 as its head.

Hm, if I copy-paste that sentence into the textbox, it seems to work? image

What else do I need to do to repro your issue?

nschneid commented 2 years ago

Not sure. In a new browser session I can't reproduce. Must have something to do with corrupted local storage or whatever in my original session.

nschneid commented 2 years ago

Oh but this is interesting: if I click the third word and enter "c" to merge left, it deletes the first word. So the merging functionality must be buggy.

nschneid commented 2 years ago

Another bug: if you initialize the tree without the multiword token, click the 2nd word and merge left, it concatenates the words in the wrong order:

1-2 'sHere  _   _   _   _   _   _   _   _
nschneid commented 2 years ago

This should check the indices to see which token is first (combining two tokens into a multiword token/supertoken):

https://github.com/jonorthwash/ud-annotatrix/blob/e951f7255e5b02b8c545af13a4a4e1e53f67054f/notatrix/src/nx/sentence.js#L381

also here (merging two tokens into one regular token):

https://github.com/jonorthwash/ud-annotatrix/blob/e951f7255e5b02b8c545af13a4a4e1e53f67054f/notatrix/src/nx/sentence.js#L324

keggsmurph21 commented 2 years ago

Oops, I didn't mean to close the whole issue ... but https://github.com/jonorthwash/ud-annotatrix/commit/a3828a4796f3f67e0517e5b8ccdc41b0b901832c should fix this part:

Another bug: if you initialize the tree without the multiword token, click the 2nd word and merge left, it concatenates the words in the wrong order:

1-2   'sHere  _   _   _   _   _   _   _   _

Thanks for the hint :grin:

keggsmurph21 commented 2 years ago

Oh but this is interesting: if I click the third word and enter "c" to merge left, it deletes the first word. So the merging functionality must be buggy.

f6865f1

nschneid commented 2 years ago

Thanks, pulled the update. Now I find that if I create several multiword tokens and then select one of them to split ("s"), it may split the wrong one.