Open jonorthwash opened 7 years ago
Hm, not sure I understand. Can you give an example?
I guess one example is "wanna". After splitting this into two tokens, there should be a way to make them subtokens of a single token.
Another example is here:
1 Бұлардың бұл _ prn dem|pl|gen 6 nmod:poss _ _
2 бір бір _ num _ 3 nummod _ _
3 ауыз ауыз _ n nom 6 acl:relcl _ _
4 бола бол _ v iv|prc_impf 3 cop _ _
5 алмаған ал _ vaux neg|gpr_past 3 aux _ _
6 себебі себеп _ n px3sp|nom 7 nsubj _ _
7-8 не _ _ _ _ _ _ _ _
7 не не _ prn itg|nom 0 root _ _
8 _ е _ cop aor|p3|sg 7 cop _ _
9 ? ? _ sent _ 7 punct _ _
Let's say you want to make 7 and 8 separate tokens instead of subtokens. There should be some way to do that in the interface.
Ah, I see, this is closely related to #8 and #36.
Yes, those are both prerequisites to working on this, probably. Those are the display- and format-level implementations; this is the editing-interface-level implementation, I guess.
If I got it right, now it works: left click on a token, then press s
, then select with an arrow, which neighbor you want to merge with it:
1.
2.
Cool, what about splitting?
You mean, removing the supertoken?
You mean, removing the supertoken?
Yes, that's a good way to think about it.
Done.
Select the token to delete:
Press delete
:
Looking good!
I'm having trouble splitting. It says the feature isn't supported yet?
It is not supported only for the sentences with spans because of the shifting issue. As I've written in #63,
The only thing affected by the index shift issue now is merging and splitting tokens, which turned out to be a bit trickier.
I'm working on it.
Ah, I think I misunderstood you at the time. Okay, cool.
@maryszmary is this fixed now ? I see that #63 is fixed.
Add functionality to the existing tokenisation routines (#2) so that tokens can be split into subtokens and adjacent subtokens can be merged.