Open Glottotopia opened 4 years ago
Note that tier is modelled as a string. This may be counter-intiuitive to a linguist, but this is necessary because of the anchoring of the vocabulary in NIF.
As a relation between strings, hasTier can be subproperty of both if tier and utterance have exactly the same extension (i.e., exactly one tier instance per utterance). This is the intended interpretation. Equivalent tier instances of the same tier category (e.g., MorphTier) can be made explicit by hasTier subproperties and Tier subclasses.
As for the ontological status of tier, indeed, I assumed a different interpretation of "tier" when creating the first Ligt draft: Because of the anchoring in NIF, tiers, utterances and words are all strings, so this is just a re-labelled hasSuperString property, and the tier is a subtype of string.
A better way of modelling would be to model tiers independently from NIF strings. I suggest to discuss this in the context of https://github.com/ld4lt/linguistic-annotation, and to revise Ligt afterwards.
As for the definition of tier: I'm open for another way of modelling already now, but only on the basis of a scientific publication or a technical whitepaper. If you have anything at hand, we should open the issue again.
To be discussed before advancement to next version.
Just a side note: there can be cases where the words used for glossing do not neatly map on the string in the source line.
haste nich gesehen
hast=du nicht ge-seh-en
have=2ps NEG PTCP-see-PTCP
In this case, "du" is not a substring of the source line string. Nevertheless, there is a semantic relation between lines 1 and 2 which one would want to capture. At least for those cases, a mechanism different from NIF would have to be used AFAICS. The question is whether this other mechanism should then also be used for vanilla cases as well.
Yes, this is an issue, and, in fact, it is happening very often due to morphophonology. In the current version of the vocabulary, we decided to use it anyway to be able to keep the connection with NIF. Conceptually, it is a substring, even when it is not on the surface.
Just a side note: there can be cases where the words used for glossing do not neatly map on the string in the source line.
haste nich gesehen hast=du nicht ge-seh-en have=2ps NEG PTCP-see-PTCP
In this case, "du" is not a substring of the source line string. Nevertheless, there is a semantic relation between lines 1 and 2 which one would want to capture. At least for those cases, a mechanism different from NIF would have to be used AFAICS. The question is whether this other mechanism should then also be used for vanilla cases as well.
see modelling in samples/nordhoff-1.ttl and comments for upcoming ligt v.0.3 version under experimental/
Note that tier is modelled as a string. This may be counter-intiuitive to a linguist, but this is necessary because of the anchoring of the vocabulary in NIF.
As a relation between strings, hasTier can be subproperty of both if tier and utterance have exactly the same extension (i.e., exactly one tier instance per utterance). This is the intended interpretation. Equivalent tier instances of the same tier category (e.g., MorphTier) can be made explicit by hasTier subproperties and Tier subclasses.
in https://www.w3.org/TR/rdf-schema/#ch_subpropertyof I read "The property rdfs:subPropertyOf is an instance of rdf:Property that is used to state that all resources related by one property are also related by another."
hasTier
is a subProperty
of subString
.
This implies that two strings a
and b
which are related by the property hasTier
are also related by the property subString
.
hasTier
is a subProperty
of superString
.
This implies that two strings a
and b
which are related by the property hasTier
are also related by the property superString
.
Taken together, this means that every two strings a
and b
which are related by hasTier
are at the same time in the subString
relation and in the superString
relation. This entails that the two strings related by hasTier
are identical.
This is true only under very special circumstances.
I suspect I am missing a logical flaw in my argument. Please point out that logical flaw.
Note that tier is modelled as a string. This may be counter-intiuitive to a linguist, but this is necessary because of the anchoring of the vocabulary in NIF.
The rdfs:comment
for Tier
reads:
A tier represents a layer of annotation. ligt:Tiers, however, can also represent bundles of annotations that refer to the same segment. In this sense (and different from Xigt), a ligt:Tier represents a segmented view on a ligt:Utterance. Note that normally, multiple Xigt tiers constitute a single ligt:Tier.
I am fine with Tier
being modeled as a string, but I have troubles understanding how a string can represent "bundles of annotations". I also fail to understand how multiple Xigt tiers can be represented in one Ligt tier if that Ligt tier is a string.
Taken together, this means that every two strings
a
andb
which are related byhasTier
are at the same time in thesubString
relation and in thesuperString
relation. This entails that the two strings related byhasTier
are identical.
Exactly.
I suspect I am missing a logical flaw in my argument. Please point out that logical flaw.
No flaw, this is the way it is currently modelled. This doesn't mean that a tier needs to provide an item for every character span in the utterance (only utterances have tiers), but only that it can.
I am fine with
Tier
being modeled as a string, but I have troubles understanding how a string can represent "bundles of annotations". I also fail to understand how multiple Xigt tiers can be represented in one Ligt tier if that Ligt tier is a string.
In Xigt, a tier is one particular segmentation associated with one particular annotation type (say, gloss and morph, cf. https://github.com/acoli-repo/LLODifier/blob/master/xigt/examples/abkhaz/abkhaz.xml).
In Ligt, a tier pertains to one particular segmentation, but the type of annotation is not hard-wired into the tier definition, but expressed in subproperties of ligt:annotation that are attached to individual items.
So, the same ligt:Item can receive multiple annotations (datatype properties, hence "bundle of annotations" in the rdfs:comment), and in Xigt, these would be spread over different tiers. In Ligt, the difference is expressed by different datatype properties, and these may refer to the same segmentation (= "tier", in Ligt terminology).
So, the same ligt:Item can receive multiple annotations (datatype properties, hence "bundle of annotations" in the rdfs:comment), and in Xigt, these would be spread over different tiers. In Ligt, the difference is expressed by different datatype properties, and these may refer to the same segmentation (= "tier", in Ligt terminology).
OK, the same item can receive multiple annotations. But Tier
s are not children of Item
; they are sisters. If this is taken to mean that ligt:Analysis
can receive multiple annotations, I can probably follow.
OK, the same item can receive multiple annotations.
Yes.
But
Tier
s are not children ofItem
; they are sisters.
Both tiers and items are nif:Strings, a substring relation holds (but can be left implicit if NIF URIs are being used). Conceptually, tiers are collections of items (expressed by the ligt:item
relation between them).
If this is taken to mean that
ligt:Analysis
can receive multiple annotations, I can probably follow.
Yes.
Utterances can point to tiers in a meronymic relationship, but the meaning is "U has subpart of type tier T". Words can point to tiers in a meronymic relationship, but the meaning is "W is itself part of a structure of type tier T". The two properties are different and should not use the same label.