w3c / json-ld-syntax

JSON-LD 1.1 Specification
https://w3c.github.io/json-ld-syntax/
Other
111 stars 22 forks source link

New feature: `@template` #362

Open pchampin opened 3 years ago

pchampin commented 3 years ago

Here is a feature that I discussed with some colleagues, and that we really would like to see in a future version of JSON-LD.

Use cases

Consider the following example JSON, as would be produced by a Web API

{
    "id": 1234,
    "name": "Alice",
    "bday": "1987-04-01",
    "height": 168
}

We know from the API documentation that id is a unique local identifier for this entity, whose corresponding IRI is http://example.org/users/1234. Unfortunately, there are two problems with the current spec:

We also know from the API documentation that height is expressed in centimetres. We would like to map it using the cdt:ucum datatype, i.e. into "168 cm"^^cdt:ucum".

Proposed solution

An example context for the use-case above would then look like

{"@context": {
    "id": {
        "@id": "@id",
        "@template": "http://example.org/users/{}"
    },
    "height": {
        "@id": "http://example.org/ont/height",
        "@type": "https://ci.mines-stetienne.fr/lindt/v3/custom_datatypes#ucum",
        "@template": "{} cm"
    }
}}

Remaining issues

iherman commented 3 years ago

Specifically for URL-s there is a URI Template RFC document. I know we referred to it in the CSVW work, but that is something that @gkellogg knows way better. I do not know whether it is too much for what we want, whether it is also usable for non-URL purposes, etc. But it has the advantage of being documented and would avoid reinventing the wheel...

pietercolpaert commented 3 years ago

I like the proposal. {} is a good suggestion, but if we would like to support more transformations, we could take inspiration from:

Probably @datatype in your last example is a typo and you meant @type though?

A little bit related: the Hydra explicit and basic variable representations for URI templates: https://www.hydra-cg.com/spec/latest/core/#example-22-the-different-variable-representations

asbjornu commented 3 years ago

I really like the idea, and agree that for URIs, @iherman's suggestion on using URI Templates is the definitive way to go. I don't think URI Templates can (or should) be used for non-URI data, though. At least not without defining some clear processing rules (perhaps some inspiration can be sourced from RFC 5229).

Something I believe is going to surface as a need almost immediately after @template is released, is the ability to interpolate the value of other properties from the document into the result of the templated strings. So something like {otherProperty} is a good idea to support from the get-go.

…Which begs the question: How should otherProperty be resolved? Are only sibling properties to the templated property allowed? If not, should JSON Pointer or JSON Path be supported? This is a slippery slope unless we pin the syntax against a stable, ratified specification.

pchampin commented 3 years ago

@pietercolpaert

Probably @datatype in your last example is a typo and you meant @type though?

Yes, thanks for spotting it. That's fixed

azaroth42 commented 3 years ago

I'm not convinced, I'm afraid.

This is even more complex than the frequently requested ability to change datatypes / add classes with @type in a context (#31, #76, etc). Instead this does additional data transformation by introducing new content in a context document, something we have previously decided is out of scope for a context.

The data would not round-trip, as the template would not be reversible. Once the context has been applied, it cannot be unapplied. Even worse, once it was applied, it could then be applied again if it could be used with strings, resulting in (e.g.) "168 cm cm cm cm cm". The template pattern is not idempotent, and there's no way to know when to apply it and when it has already been applied.

So I'm :-1: on the feature in its current state

filip26 commented 3 years ago

I agree with @azaroth42 Beside a lot of unresolved issues this feature brings, the current complexity of JSON-LD is quite big. 

There are plenty of syntax description formats (OAS, RAML, etc.) that can be used as a source for a preprocessor to transform an input into hypermedia format before passing it to JSON-LD processor.

pchampin commented 3 years ago

@iherman @asbjornu It seems to me that URL templates are both too much (supporting multiple placeholders) and not enough (supporting only URLs).

@pietercolpaert @asbjornu I really want to keep the system as simple as possible, in particular because the inability to round-trip would be a deal-breaker (see @azaroth42's comment).

@azaroth42 I don't think my initial proposal above "transforms" data much more than "@language": "fr" or "@type": "xsd:date" (used in a context), really. As for round-tripping, I do believe that this proposal supports it. I may have overlooked some edge-cases, but overall I think it is achievable. That is, if we refrain from adding complex pre-processing on the value besides injecting it in the template. Maybe it will require that templates are only applied to numbers, not strings. But numeric IDs are really pervasive, so that use-case alone would make templates useful, I believe.

@filip26 Yes of course, pre-processing is an option, but then we lose the nice round-tripping feature that JSON-LD offers. And again, I do believe that @template can be made to round-trip.

azaroth42 commented 3 years ago

Okay, if it only works for numeric data (by which I mean r/-?[0-9.]+/) then reversing the template seems easy enough for compaction to handle. And if you don't compact with the same context that you used to expand, then you shouldn't expect to get the same results.

I have use cases for this too, FWIW. We have information systems that naively use an incrementing integer as the core identity for records describing objects, people, places and events. If @id could use a @template, then we would drop that integer in and use the template to expand to the full URI.

For example: http://vocab.getty.edu/aat/300194222 could have: "id": 300194222 which would make people happy. Or (in my new institution): https://collection.britishart.yale.edu/id/page/object/1084 would have "id": 1084

With those caveats to ensure round tripping, I'm a definite +0

gkellogg commented 3 years ago

For values of @id, I agree that we should be able to interpret other primitive types, such as number, as suitable for IRI expansion.

We did use the URI templates mechanism for CSVW, and it can do everything we want, and probably needs some more consideration to see how it adapts to creating URIs from values. Something like @template in a term definition that specifies the URI template to use would be good.

One of the issues we've encountered recently, though, is that URI templates end up escaping non-ASCII data. I believe that you can uniformly decode the value of the template transformation without messing up any legitimate data, which we'd need to be sure to allow for.

As @asbjornu says, a URI template can interpret other variable values, which could potentially come from other properties of the node, but this adds substantial complexity, and I would suggest we constrain ourselves to "the simplest thing which could possibly work", and see where that gets us. In this case, limiting ourselves to only the value of the @id key (when it is a primitive type).

Regarding non-IRI values, CSVW also considered using {{mustache}}, but it is inadequately specified and would add even more complexity, so no good solution there, I'm afraid. I've often thought that a URI Template-like mechanism that could be extended to literal values would be useful, but alas ...

ajs6f commented 3 years ago

Would this not work for more general alphanumeric identifiers? There are plenty in wide use (e.g. UUIDs). The restriction to a single use of a single value is what seems to me to make it feasible, but perhaps I'm missing something.

gkellogg commented 3 years ago

It should work for any URI, and with unescaping, IRI. It is just restricted to generate URIs, not general string literals.

pchampin commented 3 years ago

@ajs6f

Would this not work for more general alphanumeric identifiers?

The problem is: if you try to compact http://ex.co/id/1234 through template http://ex.com/id/{}, should you produce the string "1234" or the number 1234? If we restrict templates to numbers, there is no ambiguity.

Also, canonical numbers representations can (I think) be inserted into IRIs without any escaping/encoding, while other characters may need this.

@gkellogg

It is just restricted to generate URIs, not general string literals.

Well, the second use case is important for us too...

Regarding non-IRI values, CSVW also considered using {{mustache}}, but it is inadequately specified and would add even more complexity, so no good solution there, I'm afraid.

This is why I was not suggesting to design a complex formatting mechanism, just dead-simple substitution -- or, more precisely: concatenating something before and after the original value. This also makes "unapplying" the template easy.

Should the template fail to "unapply" during compaction (either because the expanded value does not match the prefix or suffix of the template, or because the substring in between does not parse to a number), the term definition will not apply. This is the same thing that happens when a term definition specifies a "@language" (example in the playground).