max-ionov / ligt

Linked IGT: Modelling Interlinear Glossed Text (IGT) as RDF and/or as Linked Open Data
3 stars 1 forks source link

hasTier cannot be a subproperty of substring and superstring at the same time #2

Open Glottotopia opened 4 years ago

Glottotopia commented 4 years ago

Utterances can point to tiers in a meronymic relationship, but the meaning is "U has subpart of type tier T". Words can point to tiers in a meronymic relationship, but the meaning is "W is itself part of a structure of type tier T". The two properties are different and should not use the same label.

chiarcos commented 4 years ago

Note that tier is modelled as a string. This may be counter-intiuitive to a linguist, but this is necessary because of the anchoring of the vocabulary in NIF.

As a relation between strings, hasTier can be subproperty of both if tier and utterance have exactly the same extension (i.e., exactly one tier instance per utterance). This is the intended interpretation. Equivalent tier instances of the same tier category (e.g., MorphTier) can be made explicit by hasTier subproperties and Tier subclasses.

As for the ontological status of tier, indeed, I assumed a different interpretation of "tier" when creating the first Ligt draft: Because of the anchoring in NIF, tiers, utterances and words are all strings, so this is just a re-labelled hasSuperString property, and the tier is a subtype of string.

A better way of modelling would be to model tiers independently from NIF strings. I suggest to discuss this in the context of https://github.com/ld4lt/linguistic-annotation, and to revise Ligt afterwards.

As for the definition of tier: I'm open for another way of modelling already now, but only on the basis of a scientific publication or a technical whitepaper. If you have anything at hand, we should open the issue again.

chiarcos commented 4 years ago

To be discussed before advancement to next version.

Glottotopia commented 4 years ago

Just a side note: there can be cases where the words used for glossing do not neatly map on the string in the source line.

haste      nich  gesehen
hast=du    nicht ge-seh-en
have=2ps   NEG   PTCP-see-PTCP

In this case, "du" is not a substring of the source line string. Nevertheless, there is a semantic relation between lines 1 and 2 which one would want to capture. At least for those cases, a mechanism different from NIF would have to be used AFAICS. The question is whether this other mechanism should then also be used for vanilla cases as well.

max-ionov commented 4 years ago

Yes, this is an issue, and, in fact, it is happening very often due to morphophonology. In the current version of the vocabulary, we decided to use it anyway to be able to keep the connection with NIF. Conceptually, it is a substring, even when it is not on the surface.

chiarcos commented 4 years ago

Just a side note: there can be cases where the words used for glossing do not neatly map on the string in the source line.

haste      nich  gesehen
hast=du    nicht ge-seh-en
have=2ps   NEG   PTCP-see-PTCP

In this case, "du" is not a substring of the source line string. Nevertheless, there is a semantic relation between lines 1 and 2 which one would want to capture. At least for those cases, a mechanism different from NIF would have to be used AFAICS. The question is whether this other mechanism should then also be used for vanilla cases as well.

see modelling in samples/nordhoff-1.ttl and comments for upcoming ligt v.0.3 version under experimental/

Glottotopia commented 4 years ago

Note that tier is modelled as a string. This may be counter-intiuitive to a linguist, but this is necessary because of the anchoring of the vocabulary in NIF.

As a relation between strings, hasTier can be subproperty of both if tier and utterance have exactly the same extension (i.e., exactly one tier instance per utterance). This is the intended interpretation. Equivalent tier instances of the same tier category (e.g., MorphTier) can be made explicit by hasTier subproperties and Tier subclasses.

in https://www.w3.org/TR/rdf-schema/#ch_subpropertyof I read "The property rdfs:subPropertyOf is an instance of rdf:Property that is used to state that all resources related by one property are also related by another."

hasTier is a subProperty of subString. This implies that two strings a and b which are related by the property hasTier are also related by the property subString.

hasTier is a subProperty of superString. This implies that two strings a and b which are related by the property hasTier are also related by the property superString.

Taken together, this means that every two strings a and b which are related by hasTier are at the same time in the subString relation and in the superString relation. This entails that the two strings related by hasTier are identical. This is true only under very special circumstances. I suspect I am missing a logical flaw in my argument. Please point out that logical flaw.

Glottotopia commented 4 years ago

Note that tier is modelled as a string. This may be counter-intiuitive to a linguist, but this is necessary because of the anchoring of the vocabulary in NIF.

The rdfs:comment for Tier reads:

A tier represents a layer of annotation. ligt:Tiers, however, can also represent bundles of annotations that refer to the same segment. In this sense (and different from Xigt), a ligt:Tier represents a segmented view on a ligt:Utterance. Note that normally, multiple Xigt tiers constitute a single ligt:Tier.

I am fine with Tier being modeled as a string, but I have troubles understanding how a string can represent "bundles of annotations". I also fail to understand how multiple Xigt tiers can be represented in one Ligt tier if that Ligt tier is a string.

chiarcos commented 4 years ago

Taken together, this means that every two strings a and b which are related by hasTier are at the same time in the subString relation and in the superString relation. This entails that the two strings related by hasTier are identical.

Exactly.

I suspect I am missing a logical flaw in my argument. Please point out that logical flaw.

No flaw, this is the way it is currently modelled. This doesn't mean that a tier needs to provide an item for every character span in the utterance (only utterances have tiers), but only that it can.

chiarcos commented 4 years ago

I am fine with Tier being modeled as a string, but I have troubles understanding how a string can represent "bundles of annotations". I also fail to understand how multiple Xigt tiers can be represented in one Ligt tier if that Ligt tier is a string.

In Xigt, a tier is one particular segmentation associated with one particular annotation type (say, gloss and morph, cf. https://github.com/acoli-repo/LLODifier/blob/master/xigt/examples/abkhaz/abkhaz.xml).

In Ligt, a tier pertains to one particular segmentation, but the type of annotation is not hard-wired into the tier definition, but expressed in subproperties of ligt:annotation that are attached to individual items.

So, the same ligt:Item can receive multiple annotations (datatype properties, hence "bundle of annotations" in the rdfs:comment), and in Xigt, these would be spread over different tiers. In Ligt, the difference is expressed by different datatype properties, and these may refer to the same segmentation (= "tier", in Ligt terminology).

Glottotopia commented 4 years ago

So, the same ligt:Item can receive multiple annotations (datatype properties, hence "bundle of annotations" in the rdfs:comment), and in Xigt, these would be spread over different tiers. In Ligt, the difference is expressed by different datatype properties, and these may refer to the same segmentation (= "tier", in Ligt terminology).

OK, the same item can receive multiple annotations. But Tiers are not children of Item; they are sisters. If this is taken to mean that ligt:Analysis can receive multiple annotations, I can probably follow.

chiarcos commented 4 years ago

OK, the same item can receive multiple annotations.

Yes.

But Tiers are not children of Item; they are sisters.

Both tiers and items are nif:Strings, a substring relation holds (but can be left implicit if NIF URIs are being used). Conceptually, tiers are collections of items (expressed by the ligt:item relation between them).

If this is taken to mean that ligt:Analysis can receive multiple annotations, I can probably follow.

Yes.