Open plbt5 opened 2 years ago
CASE and UCO lack a formal definition of "Duck typing," and I believe that is the source of much confusion among committee members.
By my understanding: Informally, "Duck typing" has implied a combination of two methods of classifying objects, and optionally a third entailment:
In OWL, inferencing capabilities bring M1, M2, and E1 together. owl:Restriction
s and/or OWL rdfs:domain
and rdfs:range
interpretations can lead an inferencing engine to assign types as an "entailed ontology" (a graph including inferred triples).
I'm not sure CASE and UCO's interpretation of "Duck typing" is more than only M2. I am aware that early drafts of UCO attempted OWL domains for properties erroneously in implementation, and in purpose - domains were attempted for validation use, but OWL doesn't do data validation. Hence we went to SHACL.
CASE and UCO need to formalize their interpretation of "Duck typing," and relate it to OWL's, for us to understand the merits of this proposal. For instance, we must understand what is, and is not, meant to be entailed by:
observable:FileFacet
attached via core:hasFacet
.observable:fileName
associated with it (whether or not any Facet
is involved).@sbarnum, my understanding of the facet has been fuzzy from the beginning. @ajnelson-nist has not been able to clarify it for me. You and I have not had the time nor the incentive to discuss this.
My understanding of the design principle, as described in the UCO design document section 5, is to provide the capability to separate the object from properties.
Question: is my understanding correct? If so, please provide the capabilities that justify the principle. If not, please provide the essence of your interpretation of the principle.
In ontology engineering, the prime purpose is to define categories of things. More specific, to commit to the existence of those categories. If we look at a cup of coffee, we all share the intuition that the cup is different from the coffee, and that the two show different behaviour. This is based on what in ontology-speak is termed the "Principle of Identity": what makes that we can point at things in reality and collect things that are very similar to the cup, and other things that are very similar to the coffee. In short: what identifies something as a cup and what as coffee. The answer is probably along the lines of: the cup can hold coffee, whereas the coffee cannot hold something but requires something to hold it; on dividing the coffee in two, both parts remain coffee, whereas dividing the cup turns it disfunctional.
The significance about these answers is that they “[...] do not ask for what there is, but for what a given remark or doctrine [...] says there is” (Quine). The prime purpose of UCO and CASE is to make distinctions from the perspective of the Cyber Community; If we adopt a concept, then UCO acnowledges that such particular thing exists in the Cyber Domain: UCO commits to its existence. This is magnified by the objective of UCO to become the standard in the Cyber Domain. In order to fulfill the objective and apply ontology to achieve its purpose, we need the principle of identity. And the one and only means to implement the principle of identity in ontology is to specify what it means to be of a certain kind, to be member of a certain category. In other words, define a Class with a unique set of intentions that remain invariant over all its individual members (instances). (This does not imply, btw, that each and every class implements the principle of identity, there is also the principle of application.)
By introduction of the Facet, and by insisting on the potential to decouple between the class and its characterising properties as specified by a facet, the capability to evade the principle of identity has been provided. Because:
In conclusion, by ignoring ontological rigor in general and the principle of identity specifically, the model that is being created is not an ontology anymore. Whether it is modeled in OWL or not is irrelevant.
(Note that each and every facet commits to a particular set of characteristics, which, by ontological definition, represents a category of things. I.e., it commits to the existence of such category. Consequently, not following ontological rigor does not imply "no ontology applies", but implies "this categorisation applies" by token of the definition of the category. In other words, the opposite of ontology is not "non ontology" but "bad ontology". )
I agree with @plbt5 's remark, and also have some engineering-inspired unsettlements with Facet
.
Facet
s, to me, have long had a smell of some kind of UML artifact encroaching into ontological modeling in a fairly odd, and I believe ultimately harmful, manner.
They have proven difficult to evolve, and have harmed a non-zero number of change proposals. Issue 370 got stuck when we realized the issue enabled (with our understanding of Facet
s at the time) defining independent Facet
s on one object with disagreeing values on a Boolean property.
They are a significant discomfort to program, because at least in Python, UcoObject
s need to maintain dictionaries of Facet
objects by (as best as I've been able to design) facet-class IRI, and somehow provide a property-forwarder between programming objects in order to store properties. UCO's usage of contextual interpretation of properties, particularly observable:sizeInBytes
, means the a value for sizeInBytes
can't be assigned on the UcoObject
. Keeping track of those Facet
instance references is awkward.
They are restricted in ways that make them seem like ...organelles of objects, is the best term I can think of. We've had misunderstandings and disagreements with whether it's ever appropriate to reference them with properties aside from core:hasFacet
, and concluded no, but that took a long time to get an explicit stance on.
It would be my strong preference to remove the notion of Facet
. But, we are very late in the release cycle for 1.0.0, and we would need a substantial modeling decision on how to treat properties that are "inhering" (which currently solely reside on Facet
subclasses), versus properties that somehow demand qualification with a Relationship
object, versus properties that are both (such as a spatial relationship that is inhering, but needs annotations---e.g. a file's location within a file system, currently necessitating a observable:DataRangeFacet
on a Relationship
). We are likely committed to having Facet
endure at least the period between 1.0.0 and 2.0.0.
For all the modeling weaknesses that Facet
s enable in UCO, I think it would be a healthier way forward to look to the permitted arbitrary extensibility as opportunities to refine UCO's model, by finally embracing class disjointedness.
We have an example in this Issue's description that, in some sci-fi contexts, would be a cyborg - a person with a 2TB storage capacity. If the Ontology Committees agree "Please let's not permit that for now," we can stage for UCO 2.0.0 a disjointedness definition between identity:Person
and observable:StorageDevice
. We can discuss likewise for location:Location
and observable:RasterPicture
, though I'd be interested to see if we have someone lodge a defense of Augmented Reality applications.
I think this journey starts with making firmer commitments around what I'd named "M1", "M2", and "E1" in my prior comment.
When the subgroup meets to discuss this Issue, we should be aware of this demonstration in Oresteia:
I think it is necessary to call together the subgroup (@ajnelson-nist @sbarnum @eoghanscasey @plbt5), but only after @sbarnum has had the opportunity to describe his explanation on the Facet: Purpose and Approach. @sbarnum please try to confine the explanation to the essences only, where possible.
In response to @ajnelson-nist comment above:
The "duck typing" concept is usually used to mean the opposite of the "it's a duck" principle. The Martelli usenet posting: "In other words, don't check whether it IS-a duck: check whether it QUACKS-like-a duck, WALKS-like-a duck, etc, etc, depending on exactly what subset of duck-like behaviour you need to play your language-games with." (Wikipedia talk, section Rewrite)
I understand the M1 definition to equal the "it's a duck" principle, and M2 as the actual "duck typing" concept. Please correct me if I've understood M1 or M2 incorrectly.
In general: This reads like a very good idea to me. I can follow the way this could work and how it can make things easier. The examples match the ideas of "Duck typing" we implemented in Hansken.
Having said that, I am not an ontologist. This sounds like (already) a need for UCO 2.0 since this will definitely break the UCO interface/API (and thus also CASE 2.0).
Problem 1. Clear example. Having said that, we do see "inconsistent" data in actual databases in seized devices. By accident (software bug), on purpose (manipulated data), or by design (e.g., multiple bodies in one email with different contents)
Problem 2. I do not see why it is weird to add LatLongCoordinationFacet to a picture having geo-location details in its EXIF.
Problem 3. My answer to the specific question would be: YES, the laptop, the iPod and the hard disk are instances of devices allowing data storage. In the end, this is how "normal users" look at such devices. In my opinion, the actual implementation/technology of the devices should not be subordinate to this.
Problem 4. If OWL is the way to go with UCO, then UCO should follow the OWL approach (amongst others to benefit from the tools and techniques available for OWL).
Re: @plbt5
I understand the M1 definition to equal the "it's a duck" principle, and M2 as the actual "duck typing" concept. Please correct me if I've understood M1 or M2 incorrectly.
You understood me correctly.
Re: @Harm-van-Beek
...
This sounds like (already) a need for UCO 2.0 since this will definitely break the UCO interface/API (and thus also CASE 2.0).
Yes, this is under the 2.0.0 milestone.
Problem 1. Clear example. Having said that, we do see "inconsistent" data in actual databases in seized devices. By accident (software bug), on purpose (manipulated data), or by design (e.g., multiple bodies in one email with different contents)
Yes, ObservableObject
s that expand their behaviors between multiple unexpected classes are something UCO should continue to support.
Problem 2. I do not see why it is weird to add LatLongCoordinationFacet to a picture having geo-location details in its EXIF.
Why it's weird is that LatLongCoordinatesFacet
encourages a confusion of "is-a" vs. "has-a" relationships between classes. Suppose we didn't have Facet
s, and instead just had UcoObject
subclasses. I see the temptation to put latitude and longitude annotations onto a JPEG file's node---it's right there! in the EXIF!. Say someone took a picture near "Null Island", and an analyst characterized that picture with ex:latitude
and ex:longitude
properties directly on it.
kb:jpeg-1
a observable:RasterPicture ;
ex:latitude "1.234"^^xsd:double ;
ex:longitude "2.345"^^xsd:double ;
.
If ex:latitude
were defined like this:
ex:LatLongCoordinates
a owl:Class ;
rdfs:subClassOf location:Location ;
.
ex:latitude
a owl:DatatypeProperty ;
rdfs:range xsd:double ;
.
# And sim. for ex:longitude
then, as written above, there would be no OWL or RDFS expansion from putting ex:latitude
on anything you wanted. However, typical modeling in RDFS and OWL uses rdfs:domain
to associate a property with a class. Following that typical modeling pattern, we would also have this statement:
ex:latitude
rdfs:domain ex:LatLongCoordinates ;
.
If that domain statement is used, then the presence of ex:latitude
on kb:jpeg-1
would expand its classes to include these inferred triples:
kb:jpeg-1
a
ex:LatLongCoordinates ,
location:Location
;
ex:LatLongCoordinates
would come from RDFS expansion of ex:latitude rdfs:domain ex:LatLongCoordinates
, and location:Location
would come from RDFS expansion of ex:LatLongCoordinates rdfs:subClassOf location:Location
.
Would it make sense to you for a single graph node to be BOTH a location:Location
and observable:RasterPicture
? UCO should have something in place to separate physical-space phenomena from cyber-space concepts that are manifested only in bit streams. UCO currently does not make that separation, so we have to rely on end users' guts.
I believe the objective of associating a latitude and longitude with a picture is to say "This picture has a relationship with a location with lat Y and long X", not "This picture is a location with lat Y and long X." That is, however, a significant amount of top-level property design and class separation (using owl:disjointWith
) that has not happened.
Facet
s let UCO users bypass some modeling questions on "is-a" vs. "has-a" relationships. There is a balance to strike here, and your next response highlights the other side of the balance.
Problem 3. My answer to the specific question would be: YES, the laptop, the iPod and the hard disk are instances of devices allowing data storage. In the end, this is how "normal users" look at such devices. In my opinion, the actual implementation/technology of the devices should not be subordinate to this.
Now, suppose I say to an analyst "Please image this desktop tower." A tower has characterizations of a storage device, so they say sure thinking it's an easy overnight for their one write blocker, unscrew the case, find eight hard drives in it, and realize what was meant by this graph node handed to them as part of the chain of custody:
kb:tower-5b2188da-67a2-40e5-842c-bb582874ca2b
a observable:Computer ;
core:hasFacet [
a ex:StorageMediumFacet ;
rdfs:comment "Heads up - the OS reports having 7TB storage. Didn't know anyone made those. Box is kinda heavy, too."@en ;
observable:storageCapacityInBytes 7696581394432 ;
] ;
.
(Aside: ex:StorageMediumFacet
should be implemented as observable:StorageMediumFacet
soon after 1.0.0.)
Here, the Facet
masks a modeling matter where the "Right" thing to do is to model the tower as a composition of multiple component devices, especially as having multiple hard drives, one of which apparently wasn't actively contributing to the available storage. If we didn't have Facet
s, we would need to address as part of class-design that yes, a observable:Computer
can have storageCapacityInBytes
, but that's because it is a subclass of something like ex:ThingWithStorageCapacity
, and NOT a subclass of ex:ThingProvidingStorageCapacity
.
Problem 4. If OWL is the way to go with UCO, then UCO should follow the OWL approach (amongst others to benefit from the tools and techniques available for OWL).
To continue the OWL conversation, UCO needs technology demonstration pipelines that could be integrated into unit testing. There have been some less-than-successful attempts at this, which was part of what lead to self-building an OWL conformance suite in SHACL. We certainly welcome receiving guidance or demonstration on OWL mechanisms, but for now, UCO's adoption of OWL goes so far as some of the more elementary features (e.g. ontology versioning, disjointedness semantics, some property-range expression beyond RDFS) and, so not yet into OWL inferencing or RDFS domain
usage.
Thanks @Harm-van-Beek for your very valuable review and comments. Much appreciated.
Provide for a link to the design doc section that explains the differences between ontologies and schemata
Background
UCO has implemented Duck Typing for already a long time by application of the facet pattern. As indicated by the UCO Design Document, section 5:
The facet pattern brings about several drawbacks/problems, and we propose to implement Duck Typing by using standard OWL constructs only.
Problem 1 - inconsistent data
Facets, and particularly their subclassing, allow the following inconsistent construct to emerge:
This example shows that the subclass convenience accidentally creates a spot to record inconsistent data.
Problem 2 - strange, if not invalid, implicit commitment to reality
The intended application of the facet pattern is to allow Duck Typing, e.g., a query returns things that have storage capabilities without being enforced to be specified as storage devices:
Unfortunately, the absence of an explicit commitment to a type allows for flexibility in a way that can result in weird data, e.g.,
That example is a perfectly UCO-0.9.0-conformant manner of representing a person who has a 2TB hard drive in their pocket. However, as opposed to "a person carrying a device that has that storage capacity", what the triples actually assert is that "the person itself has storage capacity". This represents an invalid state of affairs, or is at least a rather inaccurate representation of the actual state of affairs: a human cannot be considered to be a storage device, or to have storage capabilities.
One could argue that stakeholders won't construct such weird semantics, however, a community member has said they would happily assign a
location:LatLongCoordinatesFacet
to aobservable:RasterPicture
(a subclass ofobservable:File
) if that picture file was a JPEG with lat/long coordinates embedded in its EXIF.Problem 3 - absence of explicit commitment
Despite the requirement to not enforce strict data typing, the facet defines a set of characteristics in order to represent something. This implies that each and every facet, by token of its specified characteristics, represents a certain typology implicitly: although the facet does not name the type of the typology, the typology de-facto applies.
In accordance to the above SPARQL example, the fact that there is no name attached to the category does not prevent us to conclude that the laptop (a computer) and the iPod (a music player) are similar devices as the hard disk, viz a storage device. The question at the heart of the issue is: do we commit to the conclusion that the laptop, the iPod and the hard disk are instances of one type of thing?
Yes: commit to one type of thing
If we answer the question affirmative, then we accept the behaviour of the facet to commit to the existence of a particular type of thing that gathers computers, music players and storage devices, e.g., devices that allow storage of data. Since we commit to it, there is no reason not to attach a name to the type, e.g.,
ex:DeviceAllowingDataStorage
. Consequently, the following three statements are consider valid:which implies that we have successfully characterised a type of thing by means of its characteristics: indeed a proper implementation of Duck typing.
No: these are different types of thing
If we answer the question negative, then the characterisation of the facet can be considered incomplete or otherwise invalid. We either have to add more characteristics to differentiate between the distinct types of thing, or we have mistakenly conflated the semantics of is_a with those of has_a. In any case, we have not implemented duck typing correctly.
When we combine both answers, then the conclusion is that facets either do NOT implement duck typing or do properly implement duck typing but in an OWL-unfamiliair approach. Considering that it was the intention of facets to make the distinction based on characteristics, i.e., duck typing, I'm inclined to acknowledge the objective of the facet, viz. duck typing is a necessary capability to support, but consider the design pattern to its implementation incorrect due to the absence of the explicit commitment to the existence of the type.
Problem 4 - OWL-unfamilair approach
Another consequence of the application of facets is that this is not how the Semantic Web, i.e., the OWL language, has been designed to work. The facet design pattern is not part of OWL in the sense that it is recognised as such and conclusions are drawn from it out-of-the-box: no code exists to process this design pattern. Consequently, none of the tools that are compliant to OWL will be able to process this design pattern and show the intended behaviour. If one requires the intended behaviour, this behaviour is to be implemented next to the OWL technology by each and every stakeholder that has interest in this behaviour.
This characteristic might be allowed for a local solution to a local problem, however, for a worldwide standard this is odd. Moreover, it is very problematic since it enforces local additions to the technology, additions that might even be stakeholder dependent.
Problem 5 - Undefined relationship with
core:UcoObject
Although "A facet is a grouping of characteristics unique to a particular aspect of an object" (Definition of
core:Facet
), no definition exists about the relationship that apply between the facet and the object. Two related problems arise with the absence of defining the relation:Requirements
The requirements for Duck typing have been specified already in the UCO Design Document, section 5.1, as three separate requirements.
Requirement 1
CASE uses duck typing which allows data to be defined by its inherent characteristics rather than enforcing strict data typing.
Requirement 2
CASE objects can be assigned any rational combination of facets, such as a file that is an image and a thumbnail. When employing this approach, data types are evaluated with the duck test, allowing data to be represented more truly without imposing a rigid class structure. (...)
Requirement 3
For certain common combinations of facets, it is possible to assign them a higher-level class, such a PDF File or WhatsApp Message.
Risk / Benefit analysis
Benefits
Replacing the facet pattern with an OWL-familiair Duck Typing capability, removes the need for stakeholders to provide for additional code to support the intended behaviour of the facet, viz Duck Typing, allows for consistency in its allowed data, and creates the ability to commit explicitly to the inferred typology.
The facet pattern has been used since the start of the development of UCO, and has received questions and confusion since. Replacing it with an OWL supported pattern will clarify how Duck Typing can be applied within UCO. We therefore recommend the CP's implementation in version 1.0.0 to consolidate this clear, supported and simple form of Duck Typing as opposed to suggest that the facet pattern is a necessary pattern for the UCO standard.
Risks
This CP can be considered a significant overhaul of the UCO design with the risk that community members might decide to turn away from the UCO initiative, given the effort required to implement the change.
Competencies demonstrated
Competency 1
Duck typing: When something has one or more properties, infer that it belongs to the category identified by those properties, e.g., assume that everything that allows to store data is a storage device.
Competency Question 1.1
Provided the following data:
What is the type of thing the individual
kb:object-1
represents? In SPARQL:Result 1.1
The following triple shall be inferred:
Competency 2
In terms of the UCO DD:
Competency Question 2.1
Provided the following two sets of data on the same individual:
What is the type of thing the individual
kb:raster-picture-f970b1a2-c6f1-4082-a2fb-3e8f4a7913b2
represents? In SPARQL:Result 2.1
The following triples shall be inferred:
Competency 3
Infer that a datum is a member of a higher-order class, i.e., a superclass, based on the same Duck Typing properties.
Competency Question 3.1
Provided the following data:
What is the (super)type of thing the individual
kb:object-1
represents? In SPARQL:Result 3.1
The following triples shall be inferred:
Solution suggestion
The examples apply namespace abbreviations to separate between their definition as reusable knowledge base,
kb:
, or as exemplifying data to assert a certain state of affairs,ex:
.Solution part 1
Use
rdfs:domain
andrdfs:range
statements to implement Duck Typing, as opposed to the facet pattern, for each facet that has been specified asrdfs:subClassOf core:Facet
.Solution implementation
This implies the following modifications:
observable:FileFacet
-->observable:File a owl:Class
.sh:path observable:fileName ;
-->observable:fileName rdfs:domain observable:File .
observable:fileName a rdfs:DatatypeProperty .
rdfs:ObjectProperty
, dependent on the range of the characteristic.Explanation: The use of
rdfs:domain
andrdfs:range
statements.Consider the following knowledge graph:
Note:
kb:D kb:p kb:R
, only defines thatp
is used to relateD
toR
. This allows us to say thatex:Shakespeare kb:wrote ex:Hamlet
, and subsequently, to get an answer to the question who wrote Hamlet (SELECT ?a WHERE { ?a kb:wrote ex:Hamlet }
==>ex:Shakespeare
).rdfs:domain
andrdfs:range
properties do NOT mean to validate data, i.e., that an instance of the specified object MUST HAVE the specified property. In stead, it is used the other way around to establish the type of a datum. For instance, if a datum applies the property about storage capacity, then that datum is considered to belong to the category of storage device. In pseudo code:Formalised in SPARQL, this results in:
Similarly,
rdfs:range
statements can be made to infer something to be of a certain type based on the range of a property:In contrast to the facet, both CONSTRUCT rules are already part of the set of inference rules that belong to OWL and do not need to be specified; only the domain and range relations that are used as input to these rules are required to be specified.
(CASE users may already have seen some of the impact of these
CONSTRUCT
queries. The RDFLib OWL-RL library provides "Graph expansion" features that perform some of this constructive inference. Users ofcase_validate
can make use of the features via the--inference
flag. RDFS inferencing runs those aboveCONSTRUCT
s forrdfs:domain
andrdfs:range
statements that directly reference classes. OWL inferencing can function with more nuanced domains and ranges, involving anonymous classes andowl:unionOf
/owl:intersectionOf
.)Conformance to competencies
CQ 1
For example:
ex:hasStorageCapacityInBytes
as a property to the classex:StorageDevice
.This meets CQ 1.
CQ 2
Consider the knowledge that:
kb:File
2: everything that has a pictureType is member of the type kb:Picture
Then the following triples will be inferred:
This meets CQ2
Solution part 2
Combine domain and range statements with
rdfs:subClassOf
in order to apply subclassing in the Duck Type pattern.Solution implementation
This implies similar modifications as indicated in Part 1:
observable:DigitalAddressFacet
-->observable:DigitalAddress a owl:Class
.sh:property [ sh:path observable:addressValue ]
-->NIL
observable:IPAddressFacet rdfs:subClassOf observable:DigitalAddressFacet
-->observable:IPAddress rdfs:subClassOf observable:DigitalAddress
.sh:path observable:addressValue ;
-->observable:addressValue rdfs:domain observable:IPAddress .
observable:addressValue a rdfs:DatatypeProperty .
rdfs:ObjectProperty
, dependent on the range of the characteristic.Explanation: combination of inference patterns
The Type Propagation Rule
The basic subclassing inference is induced by
kb:B rdfs:subClassOf kb:A
. The meaning forrdfs:subClassOf
is given by the statements that are inferred from it. In pseudo code:This has been formalised (and included by default) as a knowledge rule in OWL:
Combination of Type Propagation with Domain and Range
The purpose of this combination is to infer that when it is asserted that the
rdfs:domain
of a property is a particular class, then it can be inferred that the property also has the superclass of the particular class in its domain. This also holds forrdfs:range
properties.Conformance to competencies
CQ 3
For example, by specifying the knowledge graph:
and adding the datum triple
allows to infer that
ex:dns-server-address-1 rdf:type observable:DigitalAddress
.This meets CQ 3.
Conclusion
In conclusion, we only need to specify:
in order to induce this particular behaviour of Duck Typing in regular OWL as opposed to adopt the unclear and complicated facet pattern.