test for polysynthetic languages

chiarcos commented 11 months ago

cf. Inuktitut example in data/gdrive

chiarcos commented 11 months ago

this is being worked on in https://github.com/acoli-repo/morph-addenda/tree/master/data/polysynthetic

chiarcos commented 7 months ago

Observation: If allomorphs are represented as ontolex:otherForm of a morph:Morph, then, it is not possible to provide a morphological segmentation on the level of specific form variants, i.e.,

atausiulugu
atausi=u=lugu (segmentation, using allomorphic variants)
atausiq=uq=lugu (morphological interpretation, using canonical morpheme representations)

The current modelling of ontolex:Forms as rdfs:Seq of morph:Morphs does not allow to provide the analogon of the second line unless we create one morph:Morph for each specific form.

Solutions (mutually exclusive)

require that we create one morph:Morph for each individual allomorphic variant of a morpheme (i.e., one morph must have exactly one form)
redefine ontolex:Forms to be an rdfs:Seq of ontolex:Forms (of morph:Morphs) rather than rdfs:Seq of ontolex:LexicalEntries (= morph:Morph)

Remarks (per solution):

we need additional vocabulary then, to link allomorphic vatriants of the same morpheme. also, we need to change the wording of "ontolex:Affix" to denote a morphological unit, but not a "morpheme"
does not require additional vocabulary, but the definition of morph:Morph is a technical one, then, not a linguistic one. (we might consider renaming it to morph:MorphologicalUnit to avoid future confusion.)

max-ionov commented 7 months ago

First of all, this problem is not unique to polysynthetic languages, allomorphy and suppletion exist in all the datasets we were looking for so far. According to our established standards it was modelled by having one morph per form which aligns with a popular idea/definition of a morph in morphology, e.g. see Wiktionary definition:

(grammar, linguistics) A recurrent distinctive sound or sequence of sounds representing an indivisible morphological form; especially as representing a morpheme.
(linguistics) An allomorph: one of a set of realizations that a morpheme can have in different contexts.

This view is shared by Hocket 1947, Kroeger 2005, Mel’čuk 2006, Hapelmath 2020; also to a degree by Halle and Marantz 1993 and Embick 2015 as part of the distributed morphology tradition.

According to this, morpheme is an abstract element as opposed to a morph — its realisation. From this point of view, a morpheme cannot have a written representation (apart from maybe an "average" representation to print in a dictionary), while a morph can (and should).

Given the more abstract nature of morphemes, it is more fitting for it to be on the level or lexical concepts than lexical entries.

Note that according to Haspelmath's proposal (2020), this does not prevent morphs to have phonological variability — which means that (a) morphs theoretically could have more than one form, (b) this definition would not prevent us from having variability created by morphophonological rules. As long as we have enough information to distinguish allormorphs from phonological variants — we can model that.

If potential users need to model morphemes, they can always use mmoon:Morpheme and link morph:Morph to them as specific realisations.

Problems with the proposal: Problems with solution 1: we do not want to explicitly limit one morph to having one form due to phonological variability. Cf. Russian palatalised and non-palatalised versions of case/number affixes: [-a] vs. [-'a] — these are not different morphs, just different forms of a morph

Problems with solution 2:

Undoing a crucial decision made previously, agreed upon and reported in most of the papers describing Morph
Removing linguistic meaning from the central element of the Morph module
Making the classes OrthMorph and PhonMorph from Paralex ontology incompatible with morph:Morph thus making it difficult to link the ontologies in a meaningful way
Potential ambiguity in modelling allomorphs: some people might still use one morph per form which can lead to non-interoperability between datasets

Revised proposed solution: require that we create one morph:Morph for each individual allomorphic variant of a morpheme (but one morph can still have multiple forms for phonological variability if required by input data). To model morphemes, recommend classes from other vocabularies, e.g. mmoon:Morpheme

chiarcos commented 7 months ago

As discussed today, my interpretation of allomorphy was much stricter,[1] but if we agree that "this does not prevent morphs to have phonological variability", that works for me. We should state that in the documentation, though. The remaining requirement is then to redefine ontolex:Forms to be an rdfs:Seq of ontolex:Forms (of morph:Morphs) rather than rdfs:Seq of ontolex:LexicalEntries (= morph:Morph).

[1] my stricter definition of allomorphy comes from/is in line with that from OntoLex-Morph minutes from 2019-02-04:

Morph is a term which refers to alternative forms or realizations of a single morpheme.
Example:
the English plural suffix is found in precisely three different pronounciations, /s/, /z/, and /@z/: cats /kats/, dogs /dogz/, and horses /ho:s@z/. Since these three elements all represent a single morpheme, they are called morphs, and we say that /s/, /z/, and /@z/ are allomorphs of the abstract or underspecified plural suffix /-Z/. (citing Glottopedia Entry for ‘Morph’)

chiarcos commented 1 month ago

update on issue:

TODO: add to allomorphy: "this does not prevent morphs to have phonological variability"
TBC: redefine ontolex:Forms to be an rdfs:Seq of ontolex:Forms (of morph:Morphs) rather than rdfs:Seq of ontolex:LexicalEntries (= morph:Morph) [this was left open when we concluded telcos]

ontolex / morph

test for polysynthetic languages #18