skybristol / geokb

Data processing workflows for initializing and building the Geoscience Knowledgebase
The Unlicense
3 stars 3 forks source link

Reorganize and better define upper level ontology #44

Closed skybristol closed 7 months ago

skybristol commented 9 months ago

In work to incorporate elements of the Geoscience Ontology and initial stubbing out of some higher level elements into the GeoKB ontology (as logically established within the Wikibase context), it becomes clear that we need to do a little more work on defining the "top" of the conceptual framework. I'm doing work here on digesting how the Basic Formal Ontology works in practice, how the Common Core Repositories connect and are based on the BFO, and how we organize things that way while also leaving space for the points that intersect from the Wikidata and DBPedia world. I'm still starting everything with entity as the origin item (leaving aside the theoretical "thing" from BFO/Schema.org/etc.

One thing I need to look at a little bit closer is the somewhat liberal use of "same as" that I've taken where there is almost no chance that the semantics we develop in the GeoKB will agree absolutely with the semantics for entities we are pointing to as the same as our item. These are incredibly useful links to encode in our knowledge representation because they give us a hook for organizing things that have been organized in other knowledge systems and for leveraging additional content those systems may have. They also do signal to other users (human and AI) that we've done some thinking about the relationship between our representation of certain concepts and what other folks have done. Linking with Wikidata and DBPedia also provide connection points for helping to contribute our information out in ways that communicates with how non-geoscientists think about the world. However, these "same as" linkages almost always come along with implicit caveats and so they are not exactly the same as the strict definition of owl:sameAs.

Specifically, I need to think about the use of qualifiers and references on same as claims. I'm thinking that references should point to some form of where we describe the reasoning used in linking things to other knowledge representations. That could be a combination of notes captures in Git issues or other places along with perhaps curated notes on item discussion pages within the Wikibase. We also need to think about a qualifier scheme that would be more usable for AI reasoning as signals on the relative significance of same as connections we make.

Alternatively, we could look at a completely different approach that simply records that there is a specific connection to another knowledge system with dedicated properties. That's essentially how the GSO did things. In this approach we could essentially class a whole set of linkages as following the same reasoning (e.g., this thing is related to that thing over there in this general way). Both approaches have a certain utility to them, but I'm leaning toward sticking with the "same as but with explanation" approach to keep from proliferating yet many more properties that have to be defined and organized.