unipv-larl / UD4HL

10 stars 0 forks source link

Validation Error: 'cop' not expected to have children inappropriate for Old Irish #3

Open AdeDoyle opened 1 year ago

AdeDoyle commented 1 year ago

This reflects an issue (#928) that's already open on the main UD issues page. I got the impression during the recent zoom gathering that I should open a discussion on it here before resuming the discussion in the main issue.

The gist of the problem is that the copula in Old Irish appears to be in something of a transitory stage between a PIE verb and the modern Irish copula, but (with limited exceptions) UD cannot account for a copula which can be the head of a dependency relation. I argued in the open issue that the copula contains the subject in the same way as a verb might inflect for subject, rather than being simply "in agreement" with the subject. My argument is that it is therefore necessary for the copula to be able to govern relationships with a limited number of dependent words, including emphatic particles which emphasise the subject, and verbal particles which augment copula in the same way that they do verbs.

There are two formations in which the copula can be used in Old Irish. The first, and the more problematic formation is where the copula retains inflected forms for person and number. These inflected copula forms are only used where no explicit subject is mentioned, eg. am rí na hÉrenn "I_am king of Ireland" (lit. "is_me the king of Ireland"). The trouble here is that verbal particles like no can be used in combination with the copula, in much the same way as they can with verbs, and emphatic particles (given the PART POS tag in UD corpora because they inflect) can be used to emphasise the subject represented only by the copula's inflection. See for example, amal nondafrecṅdirccsa "for that I_am present":

# sent_id = 8
# reference = 9b4
# text = .i. amal nondafrecṅdirccsa
1   .i. .i. ADV _   Abbr=Yes    5   advmod  _   _
2   amal    amal    SCONJ   _   _   5   mark    _   _
3   no  no  PART    _   PartType=Vb 4   compound:prt    _   SpaceAfter=No
4   nda is  AUX _   Mood=Ind|Number=Sing|Person=1|Polarity=Pos|Tense=Pres|VerbType=Cop  5   nsubj   _   SpaceAfter=No
5   frecṅdircc  frecndairc  ADJ _   Case=Dat|Degree=Pos|Number=Sing 0   root    _   SpaceAfter=No
6   sa  sa  PRON    _   PronType=Emp    4   amod    _   _

In this example the 1st sg. emphatic particle sa emphasises the subject which is represented nowhere other than in the copula's form, nda. The verbal particle, no, is semantically completely empty (it, itself, does not change the meaning of the sentence in any way), and is only used only as a syntactic placeholder which allows a nasal n to be prefixed to the copula form da. This prefixed nasal creates a relative clause nda "that I_am", but could not be used in that syntactic position without the particle no preceding it, no nda.

There is also another usage of the copula during the Old Irish period, where the subject is explicitly stated, and this formation always uses the 3rd person (sg. or pl.) form of the copula. For example, is meisse rí na hÉrenn "I (emphatic) am the king of Ireland" (lit. "it_is I king of Ireland"), and it insi ériu ocus albu "Ireland and Britain are islands" (lit. "they_are islands Ireland and Britain"). In this formation we see the template for the modern Irish copula. Compare for example the Modern Irish, is mise rí na hÉireann "I am king of Ireland".

In Modern Irish the copula, is, does not inflect for number and person, so it can only be interpreted as a normal auxiliary, linking subject and predicate. In Old Irish, however, the fact that is is the 3rd sg. inflection of the copula means that this formation can potentially be interpreted in more than one way. It could be interpreted as a development from the inflectional paradigm to a simple copula which actually acts like an auxiliary linking subject and predicate. Alternatively, it could be interpreted as the 3rd person inflection of the copula, representing the subject, and the explicit stating of the subject thereafter is some sort of relative construction, is meisse rí na hÉrenn "it_is I, the king of Ireland", where is "it is" is the subject, and meisse "I" is the predicate. For the sake of uniformity with modern Gaelic languages I've decided to use the cop dependency relation with the copula here, and treat the explicit subject as the nsubj, as is done in Modern Irish treebanks. Still, there remains an issue with the first formation of the copula, where it inflects.

Some workarounds which have been suggested in the open issue are that the copula could be POS-tagged verb in situations where it needs to take dependent relations, or that it could be POS-tagged PRON and be treated as the nominal subject of the predicate. I don't particularly like these, as they seem anachronistic. The diachronic development of the copula may have moved from prehistoric verb to historic copula, but it is very much a copula by the Old Irish period, albeit with some verbal holdovers. It has never been a pronoun.

Currently, the validator will allow a copula tagged AUX to have dependents if it has the deprel nsubj, so I've been using this as a workaround for inflected copula forms with no overt subject. I get the impression, though, that the general feeling about this is that it is not appropriate for UD, and the validator may be patched in future to remove this possibility. The question is, therefore, what would be the preferable way of dealing with these copula forms? Should they be POS-tagged differently even if it requires completely disregarding the diachronic development of the copula? Alternatively should the limited exceptions already allowed by the validator be maintained or even expanded to allow for copulas in stages of development similar to that of Old Irish?

amir-zeldes commented 1 year ago

Hi @AdeDoyle , I've laid out my views on this particular case on the UD repo already, but generally speaking I think the UD policy is quite clear: copulas cannot take dependents unless they are promoted, except for goeswith and fixed of course. The UD view is that the copula and its predicate form a unit, and any modifiers that might be considered to belong to the copula belong to the predication as a whole. This is also true of aux (when the copula has a compound tense with an auxiliary) or negation - they belong to the entire predication.

As for verbs containing inflectional markers that indicate the subject, if they are always there then I would consider this to be the typical behavior of strongly inflected verbs in Indo-European or Semitic languages, and just mark Person=1/2/3 on the verb but no subject relation. If the verb is a fusion of a pronoun and a verb (distinct from just verbal inflection), then it can be sub-tokenized using a MWT. I think here the case is just of verbal inflection, and we should have Person FEATS, but no nsubj deprel. Of course there are many additional subtleties, but I hope they are covered in the discussion the UD repo issue you linked to above.