Closed natalink closed 6 years ago
This will make a problem how to encode e.g. this:
['_', '1:LVC', '_', '1;2:LVC', '_', '_', '_', '_', '2', '_']
form lemma .... LVC head
.........................................
form lemma .... LVC;LVC child;head .....
This solution makes sense from a user point of view -- no need to write regex each time (vmwe="LVC.*"), but really sucks for cases like above.
I'd go for type and head/child + additional attribute called lvc_id
(taken from the parseme data).
It would make no sense to search for a particular value of lvc_id
;
but it could be compared to lvc_id
of other words in the same sentence:
(meet -5 5 1:[lvc_dependency="head"] 2:[lvc_id=1.lvc_id]) within <s/>
The left side is the original, on the right side is my suggestion of encoding:
1 They they _ _ _ _ _
2 were be _ _ _ _ _
3 letting let 1:VPC;2:VPC VPC;VPC head;head 1;2 let in;let out
4 us we _ _ _ _ _
5 in in 1 VPC child 1 let in;let out
6 and and _ _ _ _ _
7 out out 2 VPC child 2 let in;let out
nešlo jít 2:LVC LVC head 2 jít_o_vpadnutí_do_zad
tedy tedy _ _ _ _ _
o o 2 LVC child 2 jít_o_vpadnutí_do_zad
žádné žádný _ _ _ _ _
vpadnutí vpadnutí 1:ID;2 ID;LVC head;child 1;2 vpadnutí_do_zad;jít_o_vpadnutí_do_zad
do do 1;2 ID;LVC child;child 1;2 vpadnutí_do_zad;jít_o_vpadnutí_do_zad
zad záda 1;2 ID;LVC child;child 1;2 vpadnutí_do_zad;jít_o_vpadnutí_do_zad
I think this would solve the problem of nested or overlapping queries, woudn't it?
i think it is nice to think of api baded usages, where the manatee is called to bring result for a software that uses the annotations. there is no best solution; what metters most is to have a use case and an example for the annotation usages. lastly i prefer simpler structure, if the schema is too complicated, then it will loose its aplicabiloty and user friendliness: we don’t like to write two full lines of cql for making concordance for a vmwe?! :)
On Friday, December 8, 2017, Anša Vernerová notifications@github.com wrote:
I'd go for type and head/child + additional attribute called lvc_id (taken from the parseme data). It would make no sense to search for a particular value of lvc_id; but it could be compared to lvc_id of other words in the same sentence: (meet -5 5 1:[lvc_dependency="head"] 2:[lvc_id=1.lvc_id]) within
The left side is the original, on the right side is my suggestion of encoding:
1 They they 2 were be 3 letting let 1:VPC;2:VPC VPC;VPC head;head 1;2 let in;let out 4 us we 5 in in 1 VPC child 1 let in;let out 6 and and 7 out out 2 VPC child 2 let in;let out
nešlo jít 2:LVC LVC head 2 jít_o_vpadnutí_dozad tedy tedy o o 2 LVC child 2 jít_o_vpadnutí_dozad žádné žádný vpadnutí vpadnutí 1:ID;2 ID;LVC head;child 1;2 vpadnutí_do_zad;jít_o_vpadnutí_do_zad do do 1;2 ID;LVC child;child 1;2 vpadnutí_do_zad;jít_o_vpadnutí_do_zad zad záda 1;2 ID;LVC child;child 1;2 vpadnutí_do_zad;jít_o_vpadnutí_do_zad
I think this would solve the problem of nested or overlapping queries, woudn't it?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/natalink/mwe_noske/issues/5#issuecomment-350245704, or mute the thread https://github.com/notifications/unsubscribe-auth/AHuwE_JdRYe5ngQeCH2K2TVCAcw12ZX3ks5s-SLvgaJpZM4Q6m2s .
I started to change the code and added a Czech use case. Is it a mistake here in CS/train.parsemetsv that a verb nešlo is not marked or I miss something?
16 nešlo _ _
17 tedy _ _
18 o _ 1:ID
19 žádné _ _
20 vpadnutí _ 1;2:LVC
21 do _ 1;2
22 zad _ 1;2
@e-bej @Ansa211
Ok, I did it like this:
nešlo jít 1 ... LVC head 1 jít o vpadnutí do záda
tedy tedy 2 ... _ _ _ _
o o 3 ... child 1 jít o vpadnutí do záda
žádné žádný ... _ _ _ _
vpadnutí vpadnutí ... LVC;LVC child;head 1;2 jít o vpadnutí do záda;vpadnutí do záda
do do ... LVC;LVC child;child 1;2 jít o vpadnutí do záda;vpadnutí do záda
zad záda ... LVC;LVC child;child 1;2 jít o vpadnutí do záda;vpadnutí do záda
I think it is a mistake, that's why I have it marked differently in my example above.
Well, I don't think there is anything like "jít o vpadnutí do zad". I would say that there is only one MWE: "vpadnutí do zad". The other part, "jít o X" is a pure valency issue, not a MWE, and it can be combined with really anything (i.e. that "X" can be either a word, or a MWE, or a sentence, or whatever).
That's my linguistic intuition. However, the data you've cited has even the word "o" marked as a part of somtething (something strange). Do you want me to find out how has that happened?
No mně se ta česká data zdají celá trochu divná. kromě "nešlo tedy o žádné vpadnutí do zad", kde jsou LVC "vpadnutí do zad" (ok?) a ID "o vpadnutí do zad (?!) mě zaráží např.:
Anša
----- On 13 Dec, 2017, at 15:48, e-bej notifications@github.com wrote:
| Well, I don't think there is anything like "jít o vpadnutí do zad". I would say | that there is only one MWE: "vpadnutí do zad". The other part, "jít o X" is a | pure valency issue, not a MWE, and it can be combined with really anything | (i.e. that "X" can be either a word, or a MWE, or a sentence, or whatever).
| That's my linguistic intuition. However, the data you've cited has even the word | "o" marked as a part of somtething (something strange). Do you want me to find | out how has that happened?
| — | You are receiving this because you modified the open/close state. | Reply to this email directly, [ | https://github.com/natalink/mwe_noske/issues/5#issuecomment-351412432 | view it | on GitHub ] , or [ | https://github.com/notifications/unsubscribe-auth/ABv7ihx0k7hrRq2tFcIg_MZdoXTE_SKrks5s_-PBgaJpZM4Q6m2s | | mute the thread ] .
From a reviewer: