ietf-wg-httpapi / mediatypes

Other
5 stars 4 forks source link

No normative conversion between YAML and JSON #59

Closed gkellogg closed 2 years ago

gkellogg commented 2 years ago

In the process of working on YAML-LD, the JSON-LD Community Group needs a normative description of transforming between a YAML stream and JSON (see json-ld/yaml-ld#62). Surprisingly, although the YAML spec does define a Process that describes the relationship between YAML and a native data structure (from which a JSON representation might be created), it does not normatively describe the process for creating this native representation from YAML (or reverse).

In the case of YAML-LD, the target is the JSON-LD internal representation, which in turn is based on INFRA. We have started a description for turning YAML into the internal representation (see json-ld/yaml-ld#65), but this really seems like something we should be referring to rather than specifying.

ioggstream commented 2 years ago

@eemeli so you think this is addressable in some way?

I think we could at least improve the interoperability considerations and leave the specific proposal to a separate spec and/or an additional section in YAML.

ioggstream commented 2 years ago

@gkellogg we discussed this topic at IETF-114 and there was agreement that defining such processing a algorithm goes beyond the media type definition and outside the charter of the WG.

I think that while the interoperability considerations can be enriched with more details, this non-trivial problem should be addressed together with the YAML community in a specific document - in such a way to ensure that the processing remains valid even for future versions of YAML (e.g. YAML 1.3 is ongoing work).

WRT YAML-LD we could probably publish the application/ld+yaml media type early, and publish further documents joining efforts with the YAML community.

eemeli commented 2 years ago

This is challenging from the YAML spec point of view, as it's a superset of JSON. Specifically:

As a part of the YAML 1.3 effort, we've discussed defining something like a canonical YAML DOM, i.e. a data structure that would represent not only the data of a YAML stream, but also the comments and metadata. This sounds a bit like what you're looking for? We do not, however, have a draft of this ready yet.

gkellogg commented 2 years ago

I think that while the interoperability considerations can be enriched with more details, this non-trivial problem should be addressed together with the YAML community in a specific document - in such a way to ensure that the processing remains valid even for future versions of YAML (e.g. YAML 1.3 is ongoing work).

That would be best.

WRT YAML-LD we could probably publish the application/ld+yaml media type early, and publish further documents joining efforts with the YAML community.

I'll defer to you. As yet, the YAML-LD document doesn't have a stable URL, as it's still under development. The draft at https://json-ld.github.io/yaml-ld/spec is a work in progress. When the CG is ready to publish a report, it will likely go in w3 space at something like https://www.w3.org/2022/jsonld-cg-reports/yaml-ld, but that is subject to change. If a stable URL isn't required, then the information in the IANA Considerations section is fairly stable aside from profile URIs.

gkellogg commented 2 years ago

This is challenging from the YAML spec point of view, as it's a superset of JSON. Specifically:

  • YAML allows for the representation of more than one document in a single stream, which is not possible with JSON.

The multi-document nature of YAML streams is useful, and there are some parallels in JSON Sequences and NDJSON, but those have no parallel in JSON-LD at this time, and are someone at odds with the RDF data model.

  • YAML allows for the schema to be customised, and for values to have explicit !tag type information attached to them.

For the complete case, this is indeed more complicated. However, what we need, and what is likely a common usage, is specifically limited to the JSON round-tripping use case using the JSON Schema. This could be considered a base starting point for describing more complicated transformations. Any substantial use of !tag is typically going to be platform dependent, or at least dependent on a wider range of standardized datatypes (i.e., those found in XML Schema Part 2. Focusing on the narrower case of JSON interoperability would address many common use cases for normative use of YAML.

As a part of the YAML 1.3 effort, we've discussed defining something like a canonical YAML DOM, i.e. a data structure that would represent not only the data of a YAML stream, but also the comments and metadata. This sounds a bit like what you're looking for? We do not, however, have a draft of this ready yet.

Aside form comments and metadata, this is what lead the JSON-LD 1.1 specifications to define an internal representation based largely on the Infra standard. But JSON-LD's needs are somewhat different, as formally, the JSON-LD data model derives from RDF 1.1 Concepts. Having a DOM, presuming that there are normative requirements for creating the DOM from a YAML source, would be quite useful. The group is not of one mind, but for me, preserving comments and metadata from a YAML source is not important.

The YAML-LD work is in fairly early stages and would lead to a community draft perhaps by the end of the year. A full on W3C Recommendation would be much further off, and that is when having such stable external references would be more important.

ioggstream commented 2 years ago

Discussed at IETF meeting. This is off topic for the wg chapter, though this can be a topic for an interoperability document produced by YAML and JSON folks.