json-ld / yaml-ld

CG specification for YAML-LD and UCR
https://json-ld.github.io/yaml-ld/spec
Other
19 stars 8 forks source link

YAML-LD canonicalization (c14n) #43

Open VladimirAlexiev opened 2 years ago

VladimirAlexiev commented 2 years ago

As an information architect. I want no variation in YAML format for the same semantic content. So that I can easily compare or sign YAML.

Canonicalization (also called c14n or normalization) is quite useful to enable the following use cases :

Prior art:

NOTE THAT this UCR is quite the opposite of #42. So if we cater to both:

ioggstream commented 2 years ago

@VladimirAlexiev I am not sure that YAML does not already provide something like that. I am not sure that's the most readable form, but did you ask to YAML folks?

e.g. for scalar, there's https://yaml.org/spec/1.2.2/#canonical-form

VladimirAlexiev commented 2 years ago

@ioggstream Added your point above. Whom should we ask, if you know such people, could you tag them here? I googled for "yaml canonicalization, yaml c14n, yaml normalization" and came up with only 1 hit.

ioggstream commented 2 years ago

@VladimirAlexiev YAML repo or https://app.element.io/#/room/#chat:yaml.io

After a brief investigation, I understood there's no easy fix for that - at least if we want to include aliases/anchors.

gkellogg commented 2 years ago

The only use for YAML C14N I can see would be for a hypothetical YAML Literal (similar to JSON Literal). And as such, that would seem to be a spec to reference, not add to YAML-LD.

As for standardizing the serialization of YAML-LD itself, I would be a 👎 on that, as it should not be necessary for conveying semantic meaning. Granted that people will want to create pretty YAML-LD output, but controls for that should be pass-through (IMO) and not required for interoperability.

ioggstream commented 2 years ago

YAML C14N ... a spec to reference

:+1: I think that the c14n discussion can be managed in the YAML community (e.g. via element). I think there's some interest there. Note that c14n and readability might be different goals.

@VladimirAlexiev I suggest to file an issue in the YAML repo so that if they come up with a solution we could reference it.

VladimirAlexiev commented 2 years ago

@gkellogg

The only use for YAML C14N I can see would be for a hypothetical YAML Literal

The main use of JSON-LD c14n is for crypto signing and verifiable credentials of whole JSON-LD files. @OR13 do you see a case for using YAML-LD for verifiable credentials ?

(negative vote) as it should not be necessary for conveying semantic meaning

Gregg, I don't understand your position: are you also against https://json-ld.github.io/rdf-dataset-canonicalization/spec/ and JSON Canonicalization Scheme (JCS)? Aren't the 2 use cases listed enough?

pretty YAML-LD output: controls for that should be pass-through (IMO) and not required for interoperability.

Of course they are not required. But:

@ioggstream

no easy fix ... for aliases/anchors.

Alias names cannot be preserved. But c14n can generate predictable aliases to achieve identical serialization. https://json-ld.github.io/rdf-dataset-canonicalization/spec/ (URGNA) does that for blank nodes, which is a lot more difficult since graph isomorphism is a problem of exponential complexity .

file an issue in the YAML repo

Posted https://github.com/yaml/yaml-spec/issues/289, added some more info, and referenced this issue.

gkellogg commented 2 years ago

@gkellogg

The only use for YAML C14N I can see would be for a hypothetical YAML Literal

The main use of JSON-LD c14n is for crypto signing and verifiable credentials of whole JSON-LD files. @OR13 do you see a case for using YAML-LD for verifiable credentials ?

IIRC, VC uses RDF Dataset Canonicalization, which does not rely on JSON C14N (other than for JSON Literals) because of these issues, other than for JWT. Are you proposing a something congruent for JWT for YAML? I would favor sticking with the LD-friendly RDF C14N.

(negative vote) as it should not be necessary for conveying semantic meaning

Gregg, I don't understand your position: are you also against https://json-ld.github.io/rdf-dataset-canonicalization/spec/ and JSON Canonicalization Scheme (JCS)? Aren't the 2 use cases listed enough?

pretty YAML-LD output: controls for that should be pass-through (IMO) and not required for interoperability.

Of course they are not required. But:

  • Defining semantic terms for such controls is IMHO fair game, because YAML is largely about readability, thus formatting
  • Using a fixed set of controls to achieve c14n is important for cases where you want a reproducible/predictable serialization

I would support some descriptive way of passing formatting options to a YAML serializer, but think it may be difficult to standardize on that, unless YAML normatively defines this, in which case we should just reference that, along with the ability to pass such controls on.

I'm wary of defining a WebIDL API which is YAML-LD specific (although we may describe updates to any existing API methods to manage YAML serialization/deserialization).

OR13 commented 1 year ago

I do see a use case for YAML-LD for both VCs and DIDs... I have worked on YAML-LD type things for DIDs...

For example:

https://github.com/transmute-industries/did-core/blob/main/packages/did-yaml/src/__fixtures__/did-yaml/example-2.yml

Per the poor decisions of the DID WG, I have stripped @context from the yaml example, so all the terms are not defined...

In some future version there would be both: application/did+ld+yaml and application/did+yaml

... and their only difference would be an @context or similar...

based on the convention set in DID Core v1... which as plagued by a complete lack of understanding with respect to JSON-LD.

VladimirAlexiev commented 1 year ago

@gkellogg

Formatting... YAML normatively defines

I just answered @TallTed's similar comment in https://github.com/json-ld/yaml-ld/issues/44#issuecomment-1180019655.

IIRC, VC uses RDF Dataset Canonicalization, which does not rely on JSON C14N

You have a point: just like round-tripping can be seen at several different levels, so can c14n. Where RDF is the fundamental level that we can use as a baseline (etalon), or default case, or even fallback.

I assume RDF c14n and JSON c14n are compatible (conformant to each other)? Has anyone explored that?