Closed fmatter closed 2 years ago
Thanks for your useful suggestions! What I implemented for now is &
in clitics described inside clitics.txt
and a custom separator for affixes (so you can have an "affix" separated by =
). The details on that latter thing are here.
Regarding class inheritance: you are right, it would be not that simple. I mean, just making Clitic
a subclass of Lexeme
would be easy, but the hard part would be to describe that multiple-lexemes-in-one-token relationship in Wordform
instances. I did something like that earlier with "subwords" , however this solution looks rather ugly.
Thank you! I've tried to implement the above Tiriyó example as follows:
clitics:
-clitic
lex: a
stem: n&a&i
paradigm: cop
trans_en: s/he is
type: en
gloss: 3&COP&UNCERT
lexemes:
-lexeme
lex: paru
stem: paru.
gloss: P.
trans_en: paru river
paradigm: n
gramm: N
paradigms:
-paradigm: cop
-flex: n.
gramm: 3
gloss: 3Sa
-paradigm: n
-flex: .hk|ao
gramm: aqu,loc
gloss: AQU|LOC
sep: =
trans_en: in water
-flex: .
gramm:
Output:
<Wordform object>
paruhkaonai
paru+a; N,aqu,loc
paru=hk-ao=nai
P.=AQU-LOC=3-COP-UNCERT
trans_en s/he is
3&COP&UNCERT
is translated to 3-COP-UNCERT
in the gloss, but n&a&i
becomes nai
, rather than n-a-i
.Thank you, I fixed both issues. Still no paradigms available for clitics though (so your paradigm: cop
has no effect).
P.S. Sorry, I just realized the link to the sep: =
description was wrong, but obviously you've already found the right place.
I just noticed the superfluous paradigm today. Thanks for the fix -- I see that IDs now work, too!
I think my clitic needs are covered, thanks a bunch.
While there is a lot of ongoing discussion about what a "clitic" is, in most approaches it is defined along the lines of "grammatically independent, but phonologically bound". E.g. when using the notion of p-word and g-word, a clitic is its own g-word, but not its own p-word.
This implies that a clitic belongs to a lexeme of its own, in turn meaning that clitics can potentially have inflectional morphology of their own (or have morphologically complex stems). And we do find instances of this in "the wild"; the following Tiriyó example shows two multi-morpheme enclitics in a row: the morphologically complex stem hkao 'in water', followed by the inflected form nai of the stem a(i) 'to be':
In
uniparser
, I can model the postposition and the copula as lexemes like this (ignoring their phonological dependence):where the copula has a paradigm containing n-, and the postposition is morphologically complex (
&
). This yields expected hk-ao 'AQU-LOC' and n-ai '3Sa-COP'. The current implementation of clitics, however, does not allow for either of these functionalities, only accepting single-morpheme clitics like the following:I would love to be able to (in addition to the simple clitic definition above) also write things like:
Of course, most clitics are morphologically simple; for these cases the current functionality where no
.
has to be added to the stem and no (zero) paradigm has to be defined is perfectly suitable.Judging from the uniparser input format for clitics and lexemes, as well as from the existence of placeholders for
paradigms
in theClitic
class (and much shared/duplicate code), clitics are already treated more like lexemes rather than like inflection. This leads to the question of whether it would make sense to make theClitic
class inherit fromLexeme
? I don't understand enough of the inner workings ofuniparser
to figure this out, just a thought. I strongly suspect it's not that simple.Maybe implementing only the second use case with
&
would be simpler? Judging from a discussion I started on the lingtyp mailing list, languages with morphologically complex clitics only have a few of them (e.g., Tiriyó does not have countless cliticized forms of the copula, and a restricted number of morphologically complex postpositions).