Open gkellogg opened 2 years ago
One stated goal is to be able to use something like YAML.dump
of the parsed JSON/YAML, which will likely not allow defining how data is serialized in these cases. This should probably be at most a SHOULD requirement and maybe best left to an extended profile. Implementing it requires tagging the object which is the root of the JSON Literal and writing a custom emitter to serialize as JSON which is a significantly more involved serialization strategy, particularly given the need to interpret the in-scope local context to know if a map entry value should be treated as a JSON Literal.
The YAML examples cited above are generated essentially by YAML.dump(JSON.load(src))
, where there is no notion of a local context.
It seems to be that the two YAML snippets above serialize to the same JSON (and this is confirmed by a quick test on https://www.convertjson.com/yaml-to-json.htm), so I don't understand where the issue is. :thinking:
It’s probably a more a more philosophical question: Must a JSON Literal necessarily have the form of JSON?
It's also a pragmatic question:
@json
literal should be treated as opaque and left alone, see https://w3c.github.io/json-ld-syntax/#the-rdf-json-datatype. I have more examples of such needs:
@type:@json
or "..."^^rdf:JSON
. If they expect JSON but find YAML, they may be unable to process it.@type:@yaml
and "..."^^rdf:YAML
I just learn now about JSON Literals... I think it is a very complex feature if you see it as a literal, because even JSON parsers will not treat it as you might expect.
For example, a JSON Literal with duplicate keys will not be treated as literal by generic JSON parsers:
{
"@context": {
"@version": 1.1,
"e": {
"@id": "http://example.com/vocab/json",
"@type": "@json"
}
},
"e": {
"a": "ciao",
"a": 1
}
}
will result in an entry with the last (or the first, it's actually implementation dependent) removed. How does JSON-LD handle these cases?
{
"@context": { ... },
"e": {
"a": 1
}
}
@ioggstream https://w3c.github.io/json-ld-syntax/#the-rdf-json-datatype says "The lexical space is the set of UNICODE strings which conform to the JSON Grammar". Hopefully that includes only valid JSON representations, i.e. no duplicate keys.
This is not an optional feature. It's part of the JSON-LD spec, so it must be supported in YAML-LD.
I provided a real-world use case for it: GraphDB connectors for Lucene, SOLR, Elastic (https://graphdb.ontotext.com/documentation/10.0/connectors.html#full-text-search-and-aggregation-connectors)
The JSON-LD Literal definition is written to allow a variation in representation. The JCS C14N considerations only come into play when describing the representation within RDF Triples. Similar to rdf:XMLLiteral
it's original intent is to allow for some portion of an XML document to be referenced as a literal across different encodings (also rdf:RDFA
).
The JSON-LD spec says non-normatively that values of @json
(or properties with "@type": "@json"
) are treated as JSON Literals. IMO, YAML-LD is free to innovate here. As there is a simple transformation from any YAML to JSON, a value of @json
could still have a more general YAML format, as long as the result can be transformed into the value space (involving JCS). That said, a SHOULD statement on using the JSON sub-set of YAML seems reasonable, and allows for implementations that cannot reasonably conform to this.
@VladimirAlexiev said:
- When converting to RDF, a
@json
literal should be treated as opaque and left alone, see https://w3c.github.io/json-ld-syntax/#the-rdf-json-datatype. I have more examples of such needs:
Then converting to RDF triples; a given serialization may have different ways of representation that. The JSON-LD from RDF algorithm describes the mechanism to use when transforming a triple containing an RDF Literal into JSON-LD.
- What should a reader expect when seeing
@type:@json
or"..."^^rdf:JSON
. If they expect JSON but find YAML, they may be unable to process it.
Two different things. A JSON-LD processor may see JSON-LD with an explicit value of type rdf:JSON
, where the value is a JCS encoded string, which would not automatically be turned into the internal @json
value object representation.
- I think we also need to declare
@type:@yaml
and"..."^^rdf:YAML
I think we need demonstrate a need here. The rdf:JSON
literal was not established lightly. What evidence is there for the use of YAML literals in the wild?
@VladimirAlexiev Afaik JSON grammar allows duplicate keys. You need JCS to forbid duplicate keys
@gkellogg
A SHOULD statement on using the JSON sub-set of YAML seems reasonable, and allows for implementations that cannot reasonably conform to this.
What do you mean with "JSON subset"? If you mean something like the "internal representation" than its feasible. Otherwise I think that we can only check that the representation graph maps to the expected JSON literal when serialised in JSON.
@VladimirAlexiev Afaik JSON grammar allows duplicate keys.
No, I believe this has been addressed by RFC8259:
The names within an object SHOULD be unique.
Not a MUST, but that is because of concerns over backwards compatibility. The interoperation of when duplicate keys are present is unspecified, as different implementations do different things.
Also JCS / RFC8785 prohibits objects from having duplicate keys:
JSON objects MUST NOT exhibit duplicate property names.
@gkellogg
treated as JSON Literals ...
Does JSON-LD use JCS or JSON? What happens in the case of the JSON literal I wrote above ? https://github.com/json-ld/yaml-ld/issues/36#issuecomment-1173637884
With regard to JSON Literals, the spec uses JCS. IIRC, the spec is silent on duplicate keys, and as noted in the RFCs, May have different behaviors. This is at least a SHOULD. But, for the specific car of JSON Literals, duplicate keys would violate the requirements of JCS.
What do you mean with "JSON subset"? If you mean something like the "internal representation" than its feasible. Otherwise I think that we can only check that the representation graph maps to the expected JSON literal when serialised in JSON.
What I meant by "JSON subset" is the subset of YAML which is, effectively JSON. I.e., the arrays, objects and native values that both YAML and JSON share. Perhaps there is another term for this.
The JSON-LD Internal Representation of a JSON Object is, however, an Infra map, which is defined specifically to have unique key/value pairs. All JSON-LD algorithms operate by transforming the JSON surface syntax into the internal representation, which will end up eliminating duplicate keys, in any case.
JSON Literals, the spec uses JCS
iiuc:
"JSON subset" is the subset of YAML which is, effectively JSON .. Infra map ...
Infra map: ordered sequence of key/value pairs. Keys are unique. Keys are strings. YAML: unordered sequence of key/value pairs. Keys are unique. Keys can be arbitrary nodes.
JSON libraries do not usually preserve ordering. I suspect that it is in general not a problem since iiuc
@type
: @json
stores the JSON-LD Internal Representation and not the verbatim JSON textIF JSON Literals are about Internal representation (the serialization always happens via JCS) then I think we do not need a @type
: @yaml
because the data model is always the JSON one, and serialization happens via JCS.
We only need @yaml
if we decide to extend the JSON-LD data model.
WDYT?
@ioggstream -- Please edit your https://github.com/json-ld/yaml-ld/issues/36#issuecomment-1174751556 and wrap code fences (either single or triple backticks) around all @
terms that aren't meant to link to GitHub users (e.g., `@yaml`
, `@type`
, `@JSON`
), because the users behind those handles probably aren't interested in our discussions and don't need alerts on every comment made here...
@ioggstream -- Please edit your #36 (comment) and wrap code fences (either single or triple backticks) around all
@
terms that aren't meant to link to GitHub users (e.g.,`@yaml`
,`@type`
,`@JSON`
), because the users behind those handles probably aren't interested in our discussions and don't need alerts on every comment made here...
I took care of it.
I propose closing this saying that YAML-LD has no specific encoding requirements for @json
value objects as long as round-tripping YAML to JSON reproduces an equivalent structure.
@gkellogg can you please check if this way of using @json
in YAML is consistent with the above words?
https://github.com/ioggstream/draft-polli-restapi-ld-keywords/pull/3/files
@gkellogg can you please check if this way of using
@json
in YAML is consistent with the above words?https://github.com/ioggstream/draft-polli-restapi-ld-keywords/pull/3/files
Yes, that seems reasonable.
@gkellogg
I think we also need to declare
@type:@yaml
and "..."^^rdf:YAML I think we need demonstrate a need here. The rdf:JSON literal was not established lightly. What evidence is there for the use of YAML literals in the wild?
Uh, wouldn't YAML-LD provide thousands of such examples?
I think we need to consider JSON and YAML literals completely independently of whether or not they have any relation to LD (just like rdf:XMLLiteral
is not RDF XML).
@json
should be true JSON, not YAML that is compatible with JSON@yaml
to be able to capture YAML literals that are not JSON (eg use block style), or not even compatible with JSON (eg use anchors & refs)Let me try to adapt our first example https://graphdb.ontotext.com/documentation/10.0/lucene-graphdb-connector.html#using-the-create-command from Turtle+JSON to YAML-LD+YAML:
'@context':
luc: http://www.ontotext.com/connectors/lucene#
luc-index: http://www.ontotext.com/connectors/lucene/instance#
ex: http://www.ontotext.com/example/wine#
rdfs: http://www.w3.org/2000/01/rdf-schema#
luc-index:my_index:
luc:createConnector: !yaml
types: [ex:Wine]
fields:
- fieldName: grape
propertyChain: [ex:madeFromGrape, rdfs:label]
- fieldName: sugar
propertyChain: [ex:hasSugar]
analyzed: false
multivalued: false
- fieldName: year
propertyChain: [ex:hasYear]
analyzed: false
I think you'll agree that's much nicer than the original.
So it's not a question of whether we need it, but how exactly to handle it:
!yaml
means "don't try to convert the rest to RDF, leave it as YAML"!yaml
just there?Note: if we change our connector implementation to use RDF instead of JSON and add a bit to the context, this becomes straight YAML-LD (notice !yaml
is removed but the payload after @context
is the same):
'@context':
luc: http://www.ontotext.com/connectors/lucene#
luc-index: http://www.ontotext.com/connectors/lucene/instance#
ex: http://www.ontotext.com/example/wine#
rdfs: http://www.w3.org/2000/01/rdf-schema#
fieldName: {'@id': luc:fieldName}
types: {'@id': luc:types, '@type': '@id', '@collection': '@list'}
fields: {'@id': luc:fields, '@type': '@id', '@collection': '@list'}
propertyChain: {'@id': luc:propertyChain, '@type': '@id', '@collection': '@list'}
analyzed: {'@id': luc:analyzed, '@type': xsd:boolean}
multivalued: {'@id': luc:multivalued, '@type': xsd:boolean}
luc-index:my_index:
luc:createConnector:
types: [ex:Wine]
fields:
- fieldName: grape
propertyChain: [ex:madeFromGrape, rdfs:label]
- fieldName: sugar
propertyChain: [ex:hasSugar]
analyzed: false
multivalued: false
- fieldName: year
propertyChain: [ex:hasYear]
analyzed: false
This YAML-LD will be converted to the following turtle:
luc-index:my_index
luc:createConnector [
luc:types (ex:Wine);
luc:fields (
[luc:fieldName "grape";
luc:propertyChain (ex:madeFromGrape rdfs:label)]
[luc:fieldName "sugar";
luc:propertyChain (ex:hasSugar);
luc:analyzed: false;
luc:multivalued: false]
[luc:fieldName "year";
luc:propertyChain (ex:hasYear);
luc:analyzed: false])]
@VladimirAlexiev (or @gkellogg) -- Please edit https://github.com/json-ld/yaml-ld/issues/36#issuecomment-1251223864 and put codefences around the @type:@yaml
in the opening quoted block. They don't need pinging about our conversation.
Done.
From the RDF Semantics
A datatype is understood to define a partial mapping, called the lexical-to-value mapping, from a lexical space (a set of character strings) to values. The function L2V maps datatypes to their lexical-to-value mapping. A literal with datatype d denotes the value obtained by applying this mapping to the character string sss: L2V(d)(sss). If the literal string is not in the lexical space, so that the lexical-to-value mapping gives no value for the literal string, then the literal has no referent. The value space of a datatype is the range of the lexical-to-value mapping. Every literal with that type either refers to a value in the value space of the type, or fails to refer at all. An ill-typed literal is one whose datatype IRI is recognized, but whose character string is assigned no value by the lexical-to-value mapping for that datatype.
The JSON-LD 1.1 Spec defines this for the rdf:JSON
literal with a lexical space composed of UNICODE strings conforming to the JSON Grammar and a value space with specific serialization requirements so that two JSON literals can be expressed, say, using different whitespace, but be considered value-equivalent through mapping to the value space via JCS.
For a hypothetical YAML datatype, the lexical space would clearly be the set of all UNICODE strings which conform to the YAML Grammar, but finding the value space is more difficult,, as multiple YAML serializations may be considered to represent the same value. I think a necessary pre-condition for establishing a YAML datatype would be to identify a normative specification for obtaining the canonical form of a YAML document/stream.
The YAML examples in the JSON-LD 1.1 spec (e.g., https://github.com/w3c/json-ld-syntax/blob/main/yaml/JSON-Literal-compacted.yaml), do not preserve the JSON serialization of a JSON literal.
It should, instead be the following:
But a simple
YAML.dump
of the parsed JSON does not take this into consideration. The spec should describe the requirements for serializing JSON literals in YAML-LD.