natalink / mwe_noske

0 stars 0 forks source link

vmwe_dependency is wrong #7

Closed Ansa211 closed 6 years ago

Ansa211 commented 6 years ago

In the parsemetsv format, the position of the marking of the MWE type is always on the first element of the MWE, in other words, id does not tell us anything about where the head is. This is clearly stated in the description paper of the Parseme data: "Each token is represented by 4 tab-separated columns featuring ... (iv) an optional VMWE code composed of the VMWE’s consecutive number in the sentence and – for the initial token in a VMWE – its category (e.g., 2:ID if a token starts an idiom which is the second VMWE in the current sentence)."

Look at this query to see it in action: heads of mwe's with "vzdát" in the lemma

It would be nice to have the heads marked, but our current vmwe_dependency attribute is meaningless, isn't it?

natalink commented 6 years ago

That's true, their "head" is only a technical term for the first element in an MWE. I think it is still useful to easily retrieve all occurrences of vmwes. E.g. you can retrieve all instances of idioms with a query [vmwe_type="ID" & vmwe_dependency="head" ]. But I acknowledge the naming "head" is confusing in this case. People from PARSEME will have more tolerance than those out of PARSEME:) Maybe choose other name for the attributes, e.g. "first" and "cont" for the continuation?

Ansa211 commented 6 years ago

vmwe_dependency was renamed to vmwe_order and the attribute values were renamed as suggested (head->first, child->cont) in f8ddbaa1e7cefd4712993530c415cded810621b2 and 02a9f1baddc58b9d95bcce69e8378cf26669af17