Open iridiankin opened 5 years ago
It seems to me that your proposal aims at breaking the principle of URI opacity. More precisely, what you are trying to say is "any IRI starting with xyz
, when used as a predicate, denotes a property which is order-sensitive".
It seems to me that your proposal aims at breaking the principle of URI opacity. More precisely, what you are trying to say is "any IRI starting with
xyz
, when used as a predicate, denotes a property which is order-sensitive".
This is a relevant concern and I have thought about it, but I don't think it is clear-cut. Also it definitely is not the aim even if it might be a consequence. Thoughts:
In practice, a small number of inferences can be made because they are explicitly licensed by the relevant specifications
Also, the opacity principle is most concerned with resource identifying IRI's, which brings us to...
s:bulleted#name-0
and s:bulleted#arguments-3
refer to two different secondary resources of the same primary resource http://example.org/section/bulleted
.
Secondary resources are by definition quite unrestricted: I can, as the designer of the documentation format, within the context of that format, reasonably make the conceptual statement that the valid set of secondary resources is all the strings and that they're treated to have lexicographical ordering.So I'm not concerned about breaking IRI opacity per-se.
What I have been most uncertain is about principles that govern the use of IRI's as predicates, as they... feel a bit more restricted. Most of the predicate ontologies seem to make use of the fragments (starting with rdf
itself) even if the fragment parts do not have intrinsic semantic meaning.
This avenue gets quite philosophical quite fast though and I'm sure there has been discussions. Are these written down anywhere?
I disagree that the URI/IRI opacity argument is limited to example 1. Both example imply that anybody can coin a new IRI with a given prefix (s:
in example 1, Relation:
in example 2) and expect others to guess from that prefix that the IRI denotes a section or a relation, respectively. For me that breaks opacity.
I also believe that the opacity-violation is indeed related to the "@container":"@list"
feature. I grant you that this is not clear-cut: unlike RDFS or OWL, JSON-LD does not describe the semantics of a vocabulary, merely how to map JSON keys to that vocabulary. However, stating "the values of that predicate should be interpreted as ordered list" is intimately linked to the semantics of said predicate (which would have range rdf:List
rather than schema:Person
, for example). So creating a blanket statement for a whole set of predicates, just because their IRI has a given prefix, breaks IRI opacity from my point of view.
I can, as the designer of the documentation format, within the context of that format, reasonably make the conceptual statement that the valid set of secondary resources is all the strings and that they're treated to have lexicographical ordering.
Granted. But this kind of format-specific knowledge could not (easily) be conveyed in RDF nor JSON-LD, since their data model consider IRIs as opaque identifiers.
I disagree that the URI/IRI opacity argument is limited to example 1. Both example imply that anybody can coin a new IRI with a given prefix (
s:
in example 1,Relation:
in example 2) and expect others to guess from that prefix that the IRI denotes a section or a relation, respectively. For me that breaks opacity.
I may be a bit confused on a more philosophical level here; why would others need to guess that the IRI denotes a section or a relation? From the point of RDF model whether an IRI is a section or relation is irrelevant as the ordering in this case is only very contextually relevant, and those who are in that context don't need to guess, they know.
Additionally, if actual triple graphs are emitted from a format like this, it is straightforward for the specification to require that appropriate type triples are emitted for the targeted objects of above predicates. In fact I fully intend to require that.
Triple inference is a thing, and contextual semantics are a thing. If I understand you correctly, you're raising the point that RDF has the noble goal of trying to be 'self-contained' - all knowledge has a triple correspondence and must be expressed as such. I recognize this is tremendously powerful and useful, but is also what makes RDF so damn clunky and heavy, and probably the reason that is slowing its adoption.
For me JSON-LD appeared to be the solution for this clunkiness, in that it is the tool that allows triples graphs to emitted and inferred from essentially contextual, efficient structures with overheads removed (both in terms of character count as well as in unnecessary indirections). Was I wrong to expect this?
As an addendum: I would also be looking for a way to specify that node dict key "s:foo#bar" would also imply "s:foo#bar a s:foo"; whether explicitly, structurally via JSON-LD @context, or via the specification itself. Triples which explicitly specify the order is probably something that would have to fully rely on inference...
I did a bit of analysis to help myself conseptualize the process more clearly, as I wanted to find an answer to the question "Do I want to 'cook'?" and to also understand 'domain-specific graphs' and their 'universalization' a bit better. I'm dumping this here in case someone is interested, some questions at the end. EDIT: man, I feel bad for being quite spammy. But I'm not sure if I can justify putting the time to edit this shorter.
Consider the following three processes, ordered by their business relevance, and where domain refers to all code and operators who conform and make use of domain specifications:
Primary-use
CONTEXTUAL-JSON ( =cook=> COOKED-JSON ) =ld-context=> JSON-LD => domain-consumer
The primary use case is almost always a case where information is produced by a domain-aware or 'spec conforming' producer and consumed by similarily conforming consumer.
Domain-reasoning
CONTEXTUAL-JSON ( =cook=> COOKED-JSON ) =ld-context=> JSON-LD ( =ld-expandize=> DOMAIN-RDF ) => domain-reasoner
Domain reasoning benefits from the well-defined semantics RDF offers, without necessarily having to have all or any of triples present as long as the domain specification is well-defined in RDF terms.
RDF-export:
CONTEXTUAL-JSON ( =cook=> COOKED-JSON ) =ld-context=> JSON-LD =ld-expandize=> ( DOMAIN-RDF =universalize=> ) UNIVERSAL-RDF => global-consumer
RDF export makes domain knowledge explicit and makes use of intercompatibility network effect. Still, remains least important for product owners because the product must be useful even without the secondary 'domainless' network effects that RDF provides.
Formats in caps, operations in arrows, optionals in parens, consumer at the end.
CONTEXTUAL-JSON
: a JSON document in a specific, local context
JSON-COOKED
: document specifically tailored for JSON-LD expansion
JSON-LD
: Self-explanatory
DOMAIN-RDF
: triple graph which is missing triples that can be inferred using domain specific knowledge
UNIVERSAL-RDF
: triple graph which contains triples that represent all semantic knowledge
cook
: A contextual pre-operation which converts a well-formed JSON structure into one that JSON-LD + appropriate @context can consume.
ld-context
: Addition of JSON-LD @context based on the current context
ld-expandize
: JSON-LD expand + serialize.
universalize
: makes domain specific semantics explicit by emitting corresponding triples.
cook
overlaps with what JSON-LD does yet requires code - do not wantcontextual-cook is arbitrarily powerful and obviously solves all the problems but the cost is having to support additional code and libraries. This cost is especially painful in contexts where documents need to be persisted or serialized in a manner that loses the execution context. With a JSON-LD-only solution only the addition of a @context is enough for the document to be well-defined within the domain. This is a trivial operation which can be made declarative, configurable part of the platform implementation.
Cooking requires specific code which needs to be updated and reviewed every time the context/domain specification is updated to a point where the question becomes "is it worth to use JSON-LD and an underlying model in the first place?".
When comparing a fully custom solution to cooking+JSON-LD the custom is considerably better for primary-use scenarios, equally good for domain-reasoning scenarios and worse for rdf-export scenarios. Fully custom wins =>
"Do I want to cook? No."
Triple removal is a no-go, as is ambiguities like having list triples and set triples to same data (?).
If it is possible to skip universalization by just relying on JSON-LD @context rules it should be done. But JSON-LD is only a structural converter; there is a lot of implicit domain knowledge that involves value introspection that can never be expressed with it.
I'm strongly leaning towards the design where there is no cooking, JSON-LD is thrown around inside the domain as the primary format, and is used and analysis efficiently without having to expandize. And even when expandizing within the domain there often would be no need for the structural/implicit triples as the domain implementations can 'assume' them. Also when universalization is needed it is straightforward to carry out. Especially if the points in the infrastructure that universalize are few it is much easier to update specifications and their implementations.
So if someone is still reading and is asking how this relates to this thread:
Without prefix-suffix support and with the criteria I laid out in the initial post, I can't have a design which does not need cooking and which doesn't break 1.2. Naturally if the design is not possible then it's not possible and I have to look for something completely different.
But the way I see it is that relying on the universalization in principle is a fair way to conform with JSON-LD and RDF. After all, the prefix-suffix proposal doesn't directly violate the opacity principle, even if in my use case it would allow me to emit these partial domain-rdf graphs which does violate it locally. But is this a problem?
So, what am I missing if anything, something obvious or less so?
To attempt to summarize from my understanding, the request is to allow per prefix defaults to be set, such as for @container
. Thus setting @container:@list
on dc
would make all dc:*
properties into lists.
To attempt to summarize from my understanding, the request is to allow per prefix defaults to be set, such as for
@container
. Thus setting@container:@list
ondc
would make alldc:*
properties into lists.
Right, although I maybe wouldn't use 'dc' as an example as I imagine that surely would break existing things...
Also setting @type:@id
is useful in the second use case, if not mandatory, to allow:
"Relation:NOTES": ["http://example.org/note#C", "http://example.org/note#CSharp", "http://example.org/note#C"]
I would also be looking for a way to specify that node dict key "s:foo#bar" would also imply "s:foo#bar a s:foo"; whether explicitly, structurally via JSON-LD @context, or via the specification itself.
This is out of scope for the WG to add triples to the graph via the context.
I would also be looking for a way to specify that node dict key "s:foo#bar" would also imply "s:foo#bar a s:foo"; whether explicitly, structurally via JSON-LD @context, or via the specification itself.
This is out of scope for the WG to add triples to the graph via the context.
Thanks; reasonable enough. Indeed in the big picture not a problem, as I've been prepared to do the 'universalization' step anyway. As long as the triples that the @context does emit are not conflicting/ambiguous.
This was discussed on the WG call of 2019-06-07, and the decision was that this is not a bug but a new feature request. As such, the closing date for new features was two weeks (see https://www.w3.org/blog/2019/03/json-ld-collaborative-work-and-feature-timeline/) after the working draft of May 10 ( https://www.w3.org/TR/json-ld11/ ) and thus May 24th.
This could be discussed in the JSON-LD community group towards a solution for a future version, and we will keep it in mind for inclusion if it solves bugs that come up within the 1.1 timeframe.
Many thanks for the detailed discussion!
This issue was discussed in a meeting.
RESOLVED: Defer syntax#191 and api#94 as new features after feature freeze
ACTION: post blog reference for feature freeze (Rob Sanderson)
ACTION: add feature freeze note to the syntax, api, and framing READMEs and issue template for bugs only (Benjamin Young)
This could be discussed in the JSON-LD community group towards a solution for a future version, and we will keep it in mind for inclusion if it solves bugs that come up within the 1.1 timeframe.
Many thanks for the detailed discussion!
Appreciated. I got useful feedback on the philosophy and direction of JSON-LD, this was mostly what I was looking for anyway. I'm in no rush... not yet at least. Thanks for that!
This enhancement proposal presents two use cases which suggest extending term definition semantics to also prefix-expanded terms.
As it is now the term definition semantics only apply during expansion to node dictionary keys which are exact matches to an extended term definition. Only the "@id" member is semantically meaningful when a term is used as a prefix.
A limited form of this proposal specific to "@container": "@list" can be found on the json-ld-api github
This enhancement is not targeted at 1.1 and as such does not make final syntactic proposals but focuses on describing the use cases for consideration. If the proposal is considered essentially sane then further experimentation is expected to yield more concrete suggestions.
1. Documentation JSON interchange format
A complex Javascript-based ecosystem exists with many semantically different places where documentation needs to be embedded and cross-referenced. HTML is too cumbersome and format such as Markdown is a convenient solution only in some situations, but requires source-code introspection and is not machine-readable. A unified JSON interchange format is preferred which satisfies following criteria:
Any two of the criteria are easy to achieve with little effort. Criteria 3 suggests JSON-LD but there are shortcomings.
Let's consider a hypothetical function 'doTheThing' documentation block (with explanatory, self-referential bogus text content):
Criteria 1 is satisfied by relying on two principles: writer only needs to understand section (
s:*
) primitives, and how the fragment is used for relative positioning and section identification.Criteria 2 is satisfied by on one hand
doc.name
anddoc.arguments.some_arg
being intuitive to access and how numbered and bulleted list contents are flat, easily programmatically accessible and emittable arrays.Criteria 3 is achievable by JSON-LD in the general case only if complex suffixes, specifically "@container": "@list" is supported:
As of now the default container type is @set, which loses both ordering and duplicates, both of which are critical. Necessitating the various suffixes to be explicitly declared in the @context loses generality.
A playground example demonstrating the set/list issue. Both "p" and "p:s" should yield list triples, but only "p" does.
There are variations on how to ideally meet the desired criteria. This variant was chosen to demonstrate how a minimal, universal @context combined allows the prefix:suffix to pull a lot of weight, to a point where addition of new doc primitives can be done without touching already deployed @context code in the wild.
2. RDF serialization of 'freeform' named, ordered sequences
Consider an object model with named, unordered, many-to-'any' mappings called 'Properties' and named, ordered many-to-many mappings called 'Relations', where the namespaces of Properties and Relations are disjoint. The names of these mappings are mutable, ad hoc, even application private in nature, so they can't be expected to have ontology definitions. Nevertheless the object model has a need for universal serialization which allows reasoning: a JSON serialization format with RDF correspondence makes sense.
A reasonable JSON serialization of such an object might be:
Similar to case 1, a generic context
would be 'future proof' and allow emitting reasonable and relatively simple RDF triple graphs. Here the need is to have prefix support for "@type": "@id" as well in addition to "@container": "@list", as per the proposal.