SELFIE "Resource" Model refinement

dblodgett-usgs commented 4 years ago

Conceptually, the "non-information" "meta" and "data" "resource" scheme discussed in SELFIE is useful, but the language is wrong when applied to the terminology of particular technologies (HTTP Resources). As such, we need to change our language to more accurately reflect what we have found to be useful ways to deal with this three-tier model.

the language is wrong when applied to the terminology of particular technologies

Everything in this scheme is identified by a HTTP URI. We have:

non-digital things that are not information,
digital things that provide meta-information about non-information things, and
digital things that are information representing or characterizing other things.

1 is clear -- there should be URIs that only ever return a 300 series redirect (or not, we've noted the conceptual and tactical drawbacks to that practice elsewhere).
2 is quite squishy. It is defined by the context of retrieving it rather than by specific characteristics of its content.
3 is clear in most cases but has potential overlap with 2 -- this is the focus of this issue.

Self describing data MUST contain metadata. Given that, done right, the data tier is the web of self-describing data and we would expect most if not all of tier 2 to be contained in tier 3.

we would expect most if not all of tier 2's meta-information to be contained in tier 3.

If we think about it that way then tier 2 is a convenience layer. Thinking about it this way -- it's convenient for what? The answer to that seems to be crawlers and humans looking to get an idea of what a real-world feature is, what it's related to, and if there's anything interesting available for it.

tier 2 is a convenience layer

Resource is the wrong word. It's ok for tier 1, but it breaks down for tier 2 and 3. A single URL may have one or more meta resource-representations and one or more data resource-representations -- each intended for a different use pertaining to the same real-world feature. This is not saying that tier 2 and tier 3 are always to be represented variously based on the same URL -- they very well may be represented as different resources (this technical diversity is what we want to enable?).

A single URL may have one or more meta representations and one or more data representations

At the end of the day, what has been referred to as the "three tiered SELFIE resource model" is actually a sort of conceptual content model that we are implementing with technology du-jour.

Can we call it a content model? What's a better description for this break down?

Can we describe how the content model can be implemented in a wide variety of ways while illustrating the few key places the implementations share common practices?

What are those common practices?

How do we illustrate that this approach can bring interoperability in key use cases to a diversity of data systems with requisite diversity of implementation patterns?

bsimons14 commented 4 years ago

Thanks for the discussion Dave. Yes the boundaries between 2 and 3 are blurred, if they exist at all. I'm not convinced however that "we would expect most if not all of tier 2's meta-information to be contained in tier 3." Looking at this from an 'old-school' GML view I think the analogy is that tier 2 is like returning a GetFeature request. That may contain everything we have about the feature in-line, or it could contain only a minimum of data (e.g. name, description, identifier) and every other property (tier 3) byReference. Again this deferred response could contain everything or a minimum with everything else byReference. (Note that it need not, and probably won't, contain all the tier 2 information.) GML communities have specified 'profiles' in attempting to reach a balance between overwhelming the response by trying to deliver everything and overwhelming the search by making everything via links. So I think what we are trying to do with tier 2 in SELFIE is provide a (community agreed?) first IR response for a NIR, with full deferred resolution, but where the resolution is not just one resolution 'type' (in my example GML) but any type or representation or profile or ... from any provider. If so, then yes tier 2 is a 'convenience layer'. It then requires mechanisms and standard concepts to allow identifying what the user/crawler can expect if the links are followed when known (cf GML property tags in xlink:hrefs). Or if not known, here's a link to something about this NIR, but I don't know what you will find there (="subjectOf"?).

dblodgett-usgs commented 4 years ago

(Note that it need not, and probably won't, contain all the tier 2 information.)

Absolutely, sorry my wording made it sound absolute. I meant the, in total, the information content at tier 3 would logically contain the content of tier 2. Just using the "self describing data" concept as a way to illustrate that tier three is all the data inclusive of meta-data.

I agree with:

So I think what we are trying to do with tier 2 in SELFIE is provide a (community agreed?) first IR response for a NIR, with full deferred resolution

The NIR/IR dichotomy is real. I don't disagree with the rest of what you describe, but I think the point here is that we need to spend some time ruminating on that next level of resolution before we are going to have enough experience to make any strong statements about how things should be engineered "in general".

rob-metalinkage commented 4 years ago

Hi folks - I have been looking at some related issues for a while - and what I'd like to do is to harmonise SELFIE recommendations with OGC own publishing practices - so will review this.

I think the problem with metadata is its always relevant - but how much and which metadata depends on the use case. This is where I believe profiles are our friend - I propose we publish a set of well known alternative profiles with definitions and identifying URIs - and try to engage with a wider community to see if these can be pushed into W3C or IEEE governance space for general application.

any given resource could be declared to conform to a set of profiles based on information it contains, and combjnations of profiles can be given convenience names. This gives us both flexibility and clarity, and some patterns to follow - and is a direct path to content-negotiation-by-profile, but is simply better descriptive modelling of resources in the short term.

I think a few basic profiles could be: concept : object described as SKOS Concept, which gives clarity about the nature of labels and hierarchical relationships independent of object model register: metadata about the relationship of the object to the containing register version: basic information about current version history: complete information about object change history prov: PROV view of object derivation provuse: PROV view of knowledge about what objects were derived from this object (particularly relevant to registers rather than individual items)

another might be "default" - which is the object based on the data model used by the register (objects must have an implicit or explicit model of properties to disambiguate instances!)

NB we already have a profile for the "alternate representations" view defined in https://www.w3.org/TR/dx-prof-conneg/ - http://www.w3.org/ns/dx/conneg/profile/rrd = Resource Representation Description) to list available resources.

Each of these profiles can be described by: 1) a SHACL shape 2) a JSON-schema 3) a JSON-LD context 4) a RDF-QB model if you want to bind specific vocabularies to a profile (i dont think these generic profiles require this, but your enterprise probably will want to profile these profiles in practice.)

I am working on a white paper explaining the relationship between these - in general i think the JSON support can and should be derived from the SHACL shape, as the most expressive option.

In the meantime, as part of some discussions with the Australian Government Linked Data Working Group I put together a set of ideas for a generic LD client that addresses the thorny issues of open/closed world, inbound and outbound linkages, profiles and metadata.

https://docs.google.com/document/d/1uv8YeDdm46DazP_JzdvHlcMtmvFi34FImbucOztiPyY/edit?usp=sharing

Please feel free to cross-review and comment.

dblodgett-usgs commented 4 years ago

Thanks for this, @rob-metalinkage -- this is very timely given some sessions this afternoon at ESIP Winter. I'll bring this up in the second working session as something the group might want to rally around.

p.s. love this...

I think the problem with metadata is its always relevant - but how much and which metadata depends on the use case.

dblodgett-usgs commented 4 years ago

I prepared a summary diagram for discussion here: https://docs.google.com/presentation/d/1MCsOLREAM1TTx5bXAg4M5KpgdkJFu1CLnQ2t91KqZnY/edit#slide=id.p

opengeospatial / SELFIE

SELFIE "Resource" Model refinement #76