oasis-tcs / lexidma

OASIS Lexicographic Infrastructure Data Model and API (LEXIDMA) TC: A repository designed for use in development of TC chartered work products and test suites. https://github.com/oasis-tcs/lexidma
Other
7 stars 8 forks source link

No sense-specific parts of speech #85

Closed michmech closed 5 months ago

michmech commented 5 months ago

(Submitted by David Lindemann)

There are dictionaries that don’t attach a POS value to the headword but have POS sections inside the entry (the headword is presented as POS neutral). For example, this entry: https://www.euskaltzaindia.eus/index.php?option=com_oehberria&task=bilaketa&Itemid=413&lang=eu-ES&query=aditu

In such a case, would you like to force a re-modeling of the inner entry hierarchy (in this case, make three entries out of one, so that each entry is not POS-ambiguous?) There are reasons for modeling a dict. like you see in the example. In Basque, for example, there are so many nominals that can be interpreted as nouns or adjectives, and the border is not clear. The above example, aditu, means “expert”, and also in English it is not that clear where it is an ADJ and where a NOUN (“I am a Basque expert / I am an expert Basque”). Another reason is that if you describe inflected forms (and there are a lot of forms for each lemma in Basque), you get very redundant if you have to list all possible forms in the entries… Related to that: We have frequency data for Basque word forms, but we are not able to say in each case if it is ADJ or NOUN…, and also, if you have an inflected past participle, is this a verb form, or a nominal (inflection behaves like the one for nouns and adjectives)??..

In Ontolex-on-Wikibase, I am modeling that as follows: I introduce POS-disambiguating property at sense level (“this sense applies to this lemma as noun”), and I do the same for inflected forms, if it is clear what POS a certain form may have (can be more than one). Example: “aditu” with POS on senses,“aditu” with POS on forms (different sources / tools give different values here, which is what I want to record in that case)

Also in German, there are dictionaries that have such POS-like sections (not across POS, but refining POS). Some dictionaries group verb senses inside an element describing a syntactic entity (“verb transitive” vs. “verb intransitive”, “verb reflexive”, etc.) - example.

michmech commented 5 months ago

Suggestion: Explain in the text why we insist that it is a good idea to disallow sense-specific morphosyntactic properties, even if it comes with trade-offs. Explain that DMLex’s relation objects can be used to trigger software into rendering multiple entries (with one part-of-speech each) as one entry (with multiple part-of-speech blocks).

michmech commented 5 months ago

I thought about it a little more and I now think that there is no need for us to explain in the text why we insist that it is a good idea to disallow sense-specific morphosyntactic properties. That sort of argumentation belongs in a different type of document, not in the spec itself. The spec should say how the data model is organized, not argue why it is organized that way.

As for explaining that relations can be used to trigger software into rendering the multiple entries as one entry for humans, we already do that, in effect, in the "suggested rendering for human users that follows some examples in the Examples appendix.

No further action is required, in my opinion. I suggest to close this issue.