w3c / json-ld-syntax

JSON-LD 1.1 Specification
https://w3c.github.io/json-ld-syntax/
Other
114 stars 22 forks source link

Compact IRI expansion support for non-trivial prefix term definitions #191

Open iridiankin opened 5 years ago

iridiankin commented 5 years ago

This enhancement proposal presents two use cases which suggest extending term definition semantics to also prefix-expanded terms.

As it is now the term definition semantics only apply during expansion to node dictionary keys which are exact matches to an extended term definition. Only the "@id" member is semantically meaningful when a term is used as a prefix.

A limited form of this proposal specific to "@container": "@list" can be found on the json-ld-api github

This enhancement is not targeted at 1.1 and as such does not make final syntactic proposals but focuses on describing the use cases for consideration. If the proposal is considered essentially sane then further experimentation is expected to yield more concrete suggestions.

1. Documentation JSON interchange format

A complex Javascript-based ecosystem exists with many semantically different places where documentation needs to be embedded and cross-referenced. HTML is too cumbersome and format such as Markdown is a convenient solution only in some situations, but requires source-code introspection and is not machine-readable. A unified JSON interchange format is preferred which satisfies following criteria:

  1. Easy to write manually, with minimal boilerplate. The more boilerplate, the higher the threshold to writing docs.
  2. Programmatically manipulable. Complex array and other wrapper nestings make both programmatic introspection and emission cumbersome and errorprone.
  3. Has a generic underlying object model. Documentation evolves; it is useful to be able to identify sections globally and not just by their relative brittle position in a JSON document.

Any two of the criteria are easy to achieve with little effort. Criteria 3 suggests JSON-LD but there are shortcomings.

Let's consider a hypothetical function 'doTheThing' documentation block (with explanatory, self-referential bogus text content):

const doc = {
  "name": "doTheThing",
  "s:#name-0": [
    "This is a generic section typically containing text.",
    "The suffix fragment 'name-0' not just positions this section lexically after 'name'",
    "but also gives this section a stable identity ('name-0') within the document."
    ""
    "Sections are separated by empty strings.",
    ["Or alternatively as lists of lists."],
    "",
    "This section contains", { "d:text": "an external link" "d:href": "foo.org" }, ".",
  ],
  "s:bulleted#name-1": [
    "The first line of a named bullet list",
    ["Second line", { "d:text#ref-1": "with embedded named link", "d:href": "target.org" }],
  ],
  "s:#name-3": [
    "More text with an embedded unnamed numbered list", { "s:numbered": ["one", "two"] }
  ],
  "arguments": {
    "d:after": "name", 
    "some_arg": {}, "other_arg": {}
  },
  "s:#arguments-0": [
    "More text which positioned after arguments section, with arguments",
    "section itself explicitly positioned after 'name' section."
  ]
}

Criteria 1 is satisfied by relying on two principles: writer only needs to understand section (s:*) primitives, and how the fragment is used for relative positioning and section identification.

Criteria 2 is satisfied by on one hand doc.name and doc.arguments.some_arg being intuitive to access and how numbered and bulleted list contents are flat, easily programmatically accessible and emittable arrays.

Criteria 3 is achievable by JSON-LD in the general case only if complex suffixes, specifically "@container": "@list" is supported:

{
  "@context": {
    "d": "http://example.org/doc/primitive/",
    "s": { "@id": "http://example.org/doc/section/", "@container": "@list" }
  }
}

As of now the default container type is @set, which loses both ordering and duplicates, both of which are critical. Necessitating the various suffixes to be explicitly declared in the @context loses generality.

A playground example demonstrating the set/list issue. Both "p" and "p:s" should yield list triples, but only "p" does.

There are variations on how to ideally meet the desired criteria. This variant was chosen to demonstrate how a minimal, universal @context combined allows the prefix:suffix to pull a lot of weight, to a point where addition of new doc primitives can be done without touching already deployed @context code in the wild.

2. RDF serialization of 'freeform' named, ordered sequences

Consider an object model with named, unordered, many-to-'any' mappings called 'Properties' and named, ordered many-to-many mappings called 'Relations', where the namespaces of Properties and Relations are disjoint. The names of these mappings are mutable, ad hoc, even application private in nature, so they can't be expected to have ontology definitions. Nevertheless the object model has a need for universal serialization which allows reasoning: a JSON serialization format with RDF correspondence makes sense.

A reasonable JSON serialization of such an object might be:

{
  "stringValue": "http://this.uri.is.not.an/object.reference",
  "toThing": { "@id": "http://some.thing/#this.is.an.object.reference" },
  "Relation:ROOMS": ["http://some.thing/room#3", "http://some.thing/room#4"],
  "Relation:SENSORS": ["http://some.thing/sensor#1", "http://some.thing/sensor#2"]
}

Similar to case 1, a generic context

{
  "@context": {
    "@vocab": "http://example.org/Property/",
    "Relation": { 
      "@id": "http://example.org/Relation/",
      "@type": "@id",
      "@container": "@list",
      "@prefix": true
    }
  }
}

would be 'future proof' and allow emitting reasonable and relatively simple RDF triple graphs. Here the need is to have prefix support for "@type": "@id" as well in addition to "@container": "@list", as per the proposal.

pchampin commented 5 years ago

It seems to me that your proposal aims at breaking the principle of URI opacity. More precisely, what you are trying to say is "any IRI starting with xyz, when used as a predicate, denotes a property which is order-sensitive".

iridiankin commented 5 years ago

It seems to me that your proposal aims at breaking the principle of URI opacity. More precisely, what you are trying to say is "any IRI starting with xyz, when used as a predicate, denotes a property which is order-sensitive".

This is a relevant concern and I have thought about it, but I don't think it is clear-cut. Also it definitely is not the aim even if it might be a consequence. Thoughts:

So I'm not concerned about breaking IRI opacity per-se. What I have been most uncertain is about principles that govern the use of IRI's as predicates, as they... feel a bit more restricted. Most of the predicate ontologies seem to make use of the fragments (starting with rdf itself) even if the fragment parts do not have intrinsic semantic meaning.

This avenue gets quite philosophical quite fast though and I'm sure there has been discussions. Are these written down anywhere?

pchampin commented 5 years ago

I disagree that the URI/IRI opacity argument is limited to example 1. Both example imply that anybody can coin a new IRI with a given prefix (s: in example 1, Relation: in example 2) and expect others to guess from that prefix that the IRI denotes a section or a relation, respectively. For me that breaks opacity.

I also believe that the opacity-violation is indeed related to the "@container":"@list" feature. I grant you that this is not clear-cut: unlike RDFS or OWL, JSON-LD does not describe the semantics of a vocabulary, merely how to map JSON keys to that vocabulary. However, stating "the values of that predicate should be interpreted as ordered list" is intimately linked to the semantics of said predicate (which would have range rdf:List rather than schema:Person, for example). So creating a blanket statement for a whole set of predicates, just because their IRI has a given prefix, breaks IRI opacity from my point of view.

pchampin commented 5 years ago

I can, as the designer of the documentation format, within the context of that format, reasonably make the conceptual statement that the valid set of secondary resources is all the strings and that they're treated to have lexicographical ordering.

Granted. But this kind of format-specific knowledge could not (easily) be conveyed in RDF nor JSON-LD, since their data model consider IRIs as opaque identifiers.

iridiankin commented 5 years ago

I disagree that the URI/IRI opacity argument is limited to example 1. Both example imply that anybody can coin a new IRI with a given prefix (s: in example 1, Relation: in example 2) and expect others to guess from that prefix that the IRI denotes a section or a relation, respectively. For me that breaks opacity.

I may be a bit confused on a more philosophical level here; why would others need to guess that the IRI denotes a section or a relation? From the point of RDF model whether an IRI is a section or relation is irrelevant as the ordering in this case is only very contextually relevant, and those who are in that context don't need to guess, they know.

Additionally, if actual triple graphs are emitted from a format like this, it is straightforward for the specification to require that appropriate type triples are emitted for the targeted objects of above predicates. In fact I fully intend to require that.

Triple inference is a thing, and contextual semantics are a thing. If I understand you correctly, you're raising the point that RDF has the noble goal of trying to be 'self-contained' - all knowledge has a triple correspondence and must be expressed as such. I recognize this is tremendously powerful and useful, but is also what makes RDF so damn clunky and heavy, and probably the reason that is slowing its adoption.

For me JSON-LD appeared to be the solution for this clunkiness, in that it is the tool that allows triples graphs to emitted and inferred from essentially contextual, efficient structures with overheads removed (both in terms of character count as well as in unnecessary indirections). Was I wrong to expect this?

iridiankin commented 5 years ago

As an addendum: I would also be looking for a way to specify that node dict key "s:foo#bar" would also imply "s:foo#bar a s:foo"; whether explicitly, structurally via JSON-LD @context, or via the specification itself. Triples which explicitly specify the order is probably something that would have to fully rely on inference...

iridiankin commented 5 years ago

I did a bit of analysis to help myself conseptualize the process more clearly, as I wanted to find an answer to the question "Do I want to 'cook'?" and to also understand 'domain-specific graphs' and their 'universalization' a bit better. I'm dumping this here in case someone is interested, some questions at the end. EDIT: man, I feel bad for being quite spammy. But I'm not sure if I can justify putting the time to edit this shorter.

1. The information processes as seen by a product owner operating in some domain

Consider the following three processes, ordered by their business relevance, and where domain refers to all code and operators who conform and make use of domain specifications:

  1. Primary-use CONTEXTUAL-JSON ( =cook=> COOKED-JSON ) =ld-context=> JSON-LD => domain-consumer The primary use case is almost always a case where information is produced by a domain-aware or 'spec conforming' producer and consumed by similarily conforming consumer.

  2. Domain-reasoning CONTEXTUAL-JSON ( =cook=> COOKED-JSON ) =ld-context=> JSON-LD ( =ld-expandize=> DOMAIN-RDF ) => domain-reasoner Domain reasoning benefits from the well-defined semantics RDF offers, without necessarily having to have all or any of triples present as long as the domain specification is well-defined in RDF terms.

  3. RDF-export: CONTEXTUAL-JSON ( =cook=> COOKED-JSON ) =ld-context=> JSON-LD =ld-expandize=> ( DOMAIN-RDF =universalize=> ) UNIVERSAL-RDF => global-consumer RDF export makes domain knowledge explicit and makes use of intercompatibility network effect. Still, remains least important for product owners because the product must be useful even without the secondary 'domainless' network effects that RDF provides.

Formats in caps, operations in arrows, optionals in parens, consumer at the end. CONTEXTUAL-JSON: a JSON document in a specific, local context JSON-COOKED: document specifically tailored for JSON-LD expansion JSON-LD: Self-explanatory DOMAIN-RDF: triple graph which is missing triples that can be inferred using domain specific knowledge UNIVERSAL-RDF: triple graph which contains triples that represent all semantic knowledge

cook: A contextual pre-operation which converts a well-formed JSON structure into one that JSON-LD + appropriate @context can consume. ld-context: Addition of JSON-LD @context based on the current context ld-expandize: JSON-LD expand + serialize. universalize: makes domain specific semantics explicit by emitting corresponding triples.

1.1. cook overlaps with what JSON-LD does yet requires code - do not want

contextual-cook is arbitrarily powerful and obviously solves all the problems but the cost is having to support additional code and libraries. This cost is especially painful in contexts where documents need to be persisted or serialized in a manner that loses the execution context. With a JSON-LD-only solution only the addition of a @context is enough for the document to be well-defined within the domain. This is a trivial operation which can be made declarative, configurable part of the platform implementation.

Cooking requires specific code which needs to be updated and reviewed every time the context/domain specification is updated to a point where the question becomes "is it worth to use JSON-LD and an underlying model in the first place?".

When comparing a fully custom solution to cooking+JSON-LD the custom is considerably better for primary-use scenarios, equally good for domain-reasoning scenarios and worse for rdf-export scenarios. Fully custom wins =>

"Do I want to cook? No."

1.2. Universalization must only consist of adding triples which don't add structural ambiguities.

Triple removal is a no-go, as is ambiguities like having list triples and set triples to same data (?).

1.3. Universalization should be elided where possible. In general case this is not always possible

If it is possible to skip universalization by just relying on JSON-LD @context rules it should be done. But JSON-LD is only a structural converter; there is a lot of implicit domain knowledge that involves value introspection that can never be expressed with it.

2. Postface

I'm strongly leaning towards the design where there is no cooking, JSON-LD is thrown around inside the domain as the primary format, and is used and analysis efficiently without having to expandize. And even when expandizing within the domain there often would be no need for the structural/implicit triples as the domain implementations can 'assume' them. Also when universalization is needed it is straightforward to carry out. Especially if the points in the infrastructure that universalize are few it is much easier to update specifications and their implementations.

So if someone is still reading and is asking how this relates to this thread:

Without prefix-suffix support and with the criteria I laid out in the initial post, I can't have a design which does not need cooking and which doesn't break 1.2. Naturally if the design is not possible then it's not possible and I have to look for something completely different.

But the way I see it is that relying on the universalization in principle is a fair way to conform with JSON-LD and RDF. After all, the prefix-suffix proposal doesn't directly violate the opacity principle, even if in my use case it would allow me to emit these partial domain-rdf graphs which does violate it locally. But is this a problem?

So, what am I missing if anything, something obvious or less so?

azaroth42 commented 5 years ago

To attempt to summarize from my understanding, the request is to allow per prefix defaults to be set, such as for @container. Thus setting @container:@list on dc would make all dc:* properties into lists.

iridiankin commented 5 years ago

To attempt to summarize from my understanding, the request is to allow per prefix defaults to be set, such as for @container. Thus setting @container:@list on dc would make all dc:* properties into lists.

Right, although I maybe wouldn't use 'dc' as an example as I imagine that surely would break existing things... Also setting @type:@id is useful in the second use case, if not mandatory, to allow: "Relation:NOTES": ["http://example.org/note#C", "http://example.org/note#CSharp", "http://example.org/note#C"]

azaroth42 commented 5 years ago

I would also be looking for a way to specify that node dict key "s:foo#bar" would also imply "s:foo#bar a s:foo"; whether explicitly, structurally via JSON-LD @context, or via the specification itself.

This is out of scope for the WG to add triples to the graph via the context.

iridiankin commented 5 years ago

I would also be looking for a way to specify that node dict key "s:foo#bar" would also imply "s:foo#bar a s:foo"; whether explicitly, structurally via JSON-LD @context, or via the specification itself.

This is out of scope for the WG to add triples to the graph via the context.

Thanks; reasonable enough. Indeed in the big picture not a problem, as I've been prepared to do the 'universalization' step anyway. As long as the triples that the @context does emit are not conflicting/ambiguous.

azaroth42 commented 5 years ago

This was discussed on the WG call of 2019-06-07, and the decision was that this is not a bug but a new feature request. As such, the closing date for new features was two weeks (see https://www.w3.org/blog/2019/03/json-ld-collaborative-work-and-feature-timeline/) after the working draft of May 10 ( https://www.w3.org/TR/json-ld11/ ) and thus May 24th.

This could be discussed in the JSON-LD community group towards a solution for a future version, and we will keep it in mind for inclusion if it solves bugs that come up within the 1.1 timeframe.

Many thanks for the detailed discussion!

iherman commented 5 years ago

This issue was discussed in a meeting.

iridiankin commented 5 years ago

This could be discussed in the JSON-LD community group towards a solution for a future version, and we will keep it in mind for inclusion if it solves bugs that come up within the 1.1 timeframe.

Many thanks for the detailed discussion!

Appreciated. I got useful feedback on the philosophy and direction of JSON-LD, this was mostly what I was looking for anyway. I'm in no rush... not yet at least. Thanks for that!