Open VladimirAlexiev opened 2 years ago
JSON-LD cannot capture GeoJSON because that uses nested arrays.
This is not the case anymore with JSON-LD 1.1 (example)
This is another interesting direction to explore that does not seem to create inconsistencies with YAML spec, thanks Vladimir! We could then ask the YAML community if it is possible to "register" in some way the xsd namespace to support this kind of mappings and associate them to the yaml.org 1.2 namespace.
I suggest using full-URI tags in the examples for clarity, eg:
# see https://yaml.org/spec/1.2.2/#tag-directives
%TAG !xsd! tag:http://www.w3.org/2001/XMLSchema:
---
# short form using tags
dc:date: !xsd!date 2022-05-18
# instead of long form
dc:date: {"@type": xsd:date, "@value": 2022-05-18}
I feel that manually specifying data types for each value is very tedious, and the tag syntax is not very intuitive. My feeling is this: why don't we delegate that task to the context?
The machine is smart enough to understand that a value of a dc:date
is actually a literal with xsd:date
datatype — and JSON-LD contexts can express that.
Can you post an example? Probably we should start collecting examples of "equivalence classes" of yaml files in this repo.
@ioggstream
We should use the actual XSD namespace. The tag:
URI scheme is recommended by the YAML people but is not mandatory, so I'd rather follow TimBL's principles of using resolvable URLs:
%TAG !xsd! http://www.w3.org/2001/XMLSchema#
https://yaml.org/spec/1.2.2/#104-other-schemas allows us to make an XSD YAML scheme, and we should ask the YAML people to publish it at https://yaml.org/type/
@anatoly-scherbakov Of course if a field ALWAYS uses the same datatype, the context can provide it. But dates in instance data often come in various granularities (same with numbers). So wouldn't it be nice to write this instead of the respective long forms?
dct:created !xsd!gYear 2000
dct:issued !xsd!date 2022-05-18
dct:modified !xsd!dateTime 2022-05-18T01:12:23
@anatoly-scherbakov
My feeling is this: why don't we delegate that task to the context?
Of course we can, and that's an important role of JSON-LD contexts: making explicit some implicit constrains/dependencies (e.g. "this field expects this datatype").
However, we also need a way to make this information explicit (e.g. in the expanded form of JSON-LD). In JSON-LD, this is done with a value object {"@value": "...", "@type": "..." }
. In YAML-LD, tags provide a more concise and more idiomatic way to do it.
Also, +1 to @VladimirAlexiev use-case above.
@VladimirAlexiev @ioggstream that is an interesting point. When using JSON-LD, I always tried to ensure that a particular property always maps to a specific type, but I agree that this application of tags is compelling. :+1:
This was discussed during today's call: https://json-ld.org/minutes/2022-06-22/.
This issue was discussed in today's meeting.
I think this is a great candidate for something an extended profile could do, and something like the %TAG ! http://www.w3.org/2001/XMLSchema#
seems like a great way to go.
In my mind, this isn't a direct replacement for the @type
of JSON-LD value objects, but an extension of the JSON-LD internal representation, much the say that booleans and numbers are treated in the JSON-LD (specifically to/from RDF algorithms). Implementations would need to maintain the internally typed values when expanding/compacting/framing, represent them using the appropriate tag when serializing to YAML in extended mode, or expanding them to value objects when serializing in the basic mode.
The toRdf and fromRdf algorithms would need to honor them when generating RDF or turning RDF back into the internal representation, again running with the appropriate processing mode.
useNativeTypes
flag is true
.Otherwise, this change should be fairly transparent. IMO, this is the primary motivation for an extended profile.
So what is actually in play here is a profile of YAML itself - the profile for which JSON-LD translations are lossless, so we dont need a profile of YAML-LD, but YAML-LD is an extension of a "YAML-JSON-compatible" profile. Such a profile could be implicit - or made explicit if multiple YAML/JSON conversions are defined. Another reason to make it explicit would be to validate if a given YAML document is compatible with YAML-LD before defining the YAML-LD extended syntax for that YAML schema.
I guess in my mind, the "YAML-JDON-compatible" profile is analogous to YAML using the JSON schema. This does not depend on explicit tags, but implicitly associates the values with tag:yaml.org,2002:null
, tag:yaml.org,2002:bool
, tag:yaml.org,2002:int
, and tag:yaml.org,2002:float
.
I think something like a "YAML-XSD-compatible" profile might require the use of a tag namespace such as suggested by @VladimirAlexiev: %TAG !xsd! tag:http://www.w3.org/2001/XMLSchema:
, so a tagged value such as !xsd!dateTime 2022-05-18T01:12:23
would parse to a native DateTime literal, and the JSON-LD internal representation would be extended to support the various literal types from XSD.
If running in "extended", or "YAML-XSD-compatible" mode, a %TAG
definition such as above would be legitimate. If not running in that mode, a processor may reject the input or use Postel's law and parse it, but it should not be emitted unless the profile is set accordingly.
In my mind, this and alias nodes are the primary think that would be enabled by an extended mode.
If a processor sees some other %TAG
definition (or definitions outside of some pre-defined set) it should probably fail to process the document, which then acts as an extension point for processors to eventually support more values for %TAG
in the future, but for RDF purposes, anything beyond the XSD set
Given this, I think we may be about ready to define the processing modes more completely.
I'm thinking here about statements about conformance - :myresource dct:conformsTo
general Use Case is to be able to determine what an API supports in terms of interoperability of data payloads. Can anyone orient me to where this is being defined or discussed? I can see inline directives such as https://yaml.org/spec/1.2.2/#681-yaml-directives, @context
where a URI is referenced and $schema directives - but not where such things are registered - we have a related in IANA profiles on media types for encodings, but what about information content profiles?
Is identification of the profile out-of-band using resolvable identifiers (i.e. not in syntax-specific directives using syntax-specific keywords and versioning) a factor in defining processing modes?
@rob-metalinkage -- Please edit your last comment, https://github.com/json-ld/yaml-ld/issues/17#issuecomment-1207728874, to put @context
into a code fence (like `@context`
), so that GitHub user doesn't get endlessly pinged on threads about which they do not care.
I've looked into this some more as part of trying to implement extended support for XSD scalar values in YAML. IMO, the appropriate %TAG
value would be something like the following:
%TAG ! http://www.w3.org/2001/XMLSchema#
This would allow values such as !date 2022-08-08
, which would expand as !<http://www.w3.org/2001/XMLSchema#> "2022-08-08"
and be a natural way to capture "2022-08-08"^^<http://www.w3.org/2001/XMLSchema#>
. However, I'm stymied by a bug in LibYAML, which Ruby and many other languages rely on for parsing YAML (https://github.com/yaml/libyaml/issues/253), where #
is not accepted as a URI character (really ns-uri-char
). So far, the LibYAML team has been unresponsive, and the library shows very little activity in the last couple of years. Of course, we could hack this with some other URI, but that doesn't seem appropriate for this group.
Other YAML tools show similar issues, I think largely due to the fact that that YAML spec only uses the tag
scheme in its examples. Until this issue is resolved, I think we need to defer an extended mode for YAML-LD that would involve interpreting XSD datatype scalar values. The spec recommends the use of tag:
(oddly), and if we were to go there, we would probably want to introduce something like %TAG ! tag:www.w3.org,2022:xsd/
but that seems quite arbitrary.
An example file I've been working with to exercise this variation is the following:
%YAML 1.2
%TAG ! http://www.w3.org/2001/XMLSchema#
---
"@context":
"@vocab": http://xmlns.com/foaf/0.1/
name: !string Gregg Kellogg
homepage: https://greggkellogg.net/
depiction: http://www.gravatar.com/avatar/42f948adff3afaa52249d963117af7c8
date: !date 2022-08-08
(note, the use of a specific tag name shouldn't be significant. In this case, it's using the primary tag handle, but it could just as well be the secondary tag handle (!!
) or a named tag handle (! xsd !
) for our processing model).
If we are to support XSD types, we probably want to white-list allowed datatype URIs to include most XSD types, in addition to tag:yaml.org,2002:str
, tag:yaml.org,2002:null
, tag:yaml.org,2002:int
, tag:yaml.org,2002:float
, and tag:yaml.org,2002:bool
which would map more directly to the JSON-LD Internal Representation.
See also https://github.com/yaml/yaml-spec/issues/268#issuecomment-1208565027.
- is it at all feasible to write
"foo"@en
in YAML rather than a separate@language
field?
No, I don't believe it is, however, we could consider using a datatype form such as defined for the i18n namespace:
@prefix i18n: <https://www.w3.org/ns/i18n#> .
[ ex:title "foo"^^i18n:en ] .
Although it's defined to allow a combination of language and base-direction, it can be used for just language or base direction. Of course, we would need to define that literal values using an i18n datatype consisting of only language would be translated to language-tagged literals, and visa-versa.
@gkellogg
12.3
converted to "1.230000005e2"^^xsd:float
?")!date 2022-08-08
better than !xsd!date 2022-08-08
!id
in our "YAML XSD Schema"?onlineyamltools.com allows #
but then complains with:
Error: YAMLException: unknown tag !<http://www.w3.org/2001/XMLSchema#string> at line 6, column 28
Trying with explicit xsd tag gives the same error:
%YAML 1.2
%TAG !xsd! http://www.w3.org/2001/XMLSchema#
---
name: !xsd!string Gregg Kellogg
This tool can only use the "YAML JSON schema" builtin tags (and supports timestamp
, although that has been deprecated).
As expected, it can mangle numbers:
%YAML 1.2
%TAG ! tag:yaml.org,2002:
---
name: !str Gregg Kellogg
int: !int 123
bigint: !int 123456789012345678901231 # -> 1.2345678901234569e+23 ouch!
bigint: 123456789012345678901231 # -> 1.2345678901234569e+23 ouch!
float: !float 1.235609853907835079889067406870964870956870967908 # -> 1.235609853907835
date: !timestamp 2022-08-08 -> 2022-08-08T00:00:00.000Z
My implementation needed to use a lower-level parser that just transforms YAML to the Representation Graph without further interpretation. In Ruby Psych, this is done via Psych.parse_stream. That level shouldn't place constraints on any specific schema.
Beyond XSD: let's not forget custom datatypes, eg:
!cdt!ucum 1.20 m
is equal to (though not identical to) !cdt!ucum 120 cm
see https://github.com/w3c/sparql-12/issues/129@gkellogg -- Several unfenced @
entities are in the last several lines of the bot-posted conversation https://github.com/json-ld/yaml-ld/issues/17#issuecomment-1263840815 causing more unintended alerts to be fired in their direction.... Maybe the bot can be tweaked to codefence such entities going forward?
Sorry, must have been unfenced on IRC. I’ll fix them later
Yeah, I'm sure they were unfenced on IRC. There's no consistent value to fencing there.
Weirdly, now that they're single-backtick fenced here, those backticks are showing as part of the text instead of being interpreted as markdown -- so, for instance, we now see (bold added here to help with clarity) {"`\@id`": "someone"}, where we'd expect to see {"@id
": "someone"}.
I suspect this won't be a quick or easy fix, but it should be raised with the folks running the (now several!) IRC/log-to-GitHub bots.
Well, I handle the irc log to HTML for these minutes, which were inserted here. Perhaps could detect some bare keywords, but you’re right that the result in the comment is wrongly interpreted, but that seems like a GH issue.
I'd suggest wrapping the larger element including the @
, so {"@id": "someone"}
, which makes overall sense anyway, the larger element being code.
@type
in JSON-LD). Eg see https://github.com/w3c/json-ld-syntax/issues/387 for the pitfalls of using large integers or decimals-.inf
and.nan
, datetimes), and even more complex structures. One could declare "YAML schemas" with additional tags, eg to represent all XSD datatypesWhy might we want more than "string plus
@type
"?dc:date
below and many other examples)02022-05-18
to2022-05-18
if tagged as!xsd!date
rather than looking at a parallel@type
field.Let's collect below examples of what we could want.
@gkellogg in https://github.com/ietf-wg-httpapi/mediatypes/issues/8#issuecomment-1034040169
@VladimirAlexiev from #2:
-.inf
and.nan
).12345678901234567890.12345
is converted to RDF literal"12345678901234567168"^^xsd:integer
(see jsonld playground)@type
, eginstead of long form
dc:date: {"@type": xsd:date, "@value": 2022-05-18}