piprate / json-gold

A JSON-LD processor for Go
Apache License 2.0
259 stars 30 forks source link

JSON Canonicalization #35

Closed troyronda closed 2 years ago

troyronda commented 4 years ago

We are trying to process a document with this context: https://w3c-ccg.github.io/vc-examples/cmtr/examples/v0.1/cmtr-v0.1.jsonld

Example: https://github.com/w3c-ccg/vc-examples/blob/master/test-suite/data/credential--vc.transmute.world--CertifiedMillTestReport.json

but have the error "JSON literals not supported"

due to: https://github.com/piprate/json-gold/blob/master/ld/node.go#L300-L301

kazarena commented 4 years ago

@troyronda, yes, that's correct, json-gold doesn't currently support JSON literals. We didn't do it for two reasons:

  1. The current version of the spec says:

The JSON Canonicalization Scheme [JCS] is an emerging standard for JSON canonicalization not yet ready to be referenced. When a JSON canonicalization standard becomes available, this specification will likely be updated to require such a canonical representation. Users are cautioned from depending on the JSON literal lexical representation as an RDF literal, as the specifics of serialization may change in a future revision of this document.

At the time I was working on the JSON-LD 1.1 implementation, the spec was still WIP, and I decided to wait and see if this feature would evolve.

  1. I wasn't able to find any suitable Go implementations of JSON Canonicalization Scheme (JCS) used to process JSON literals. The only one I found was this one, and it didn't look solid enough to add as a dependency. If there was a good implementation available, it's a very small job to add support for JSON Literals. Having to write our own implementation is not something I look forward to 😄. If you know any alternatives, please let me know. PRs welcome 👍
gkellogg commented 4 years ago

We ended up taking the Python version and adding it in the source tree for PyLD; it wasn't useful as a dependency, but it works well.

Note the PyLD (and jsonld.js) only support the "i18n-datatype" part of the spec. I suspect that "compound-literal" will end up going away.

kazarena commented 4 years ago

@gkellogg thanks for your insights! I may also end up borrowing the implementation, if I get an idea how closely it follows the spec.

@troyronda, regarding the example above, personally I would be very careful about using JSON Literals in the Verifiable Credentials context for the time being. Because most likely it will be used to create a normalised form to sign the credential, followed by storing it in some DLT-like, append-only platform. I suspect that either the spec will evolve, or implementations will eventually align in a non-normative way, that is potentially different from the current one. One may end up with a multitude of verifiable credentials that no JSON-LD implementation can verify because the future normalised form would change. @gkellogg would know better though!

In the spirit of information sharing, we have to process both JSON-LD and plain JSON in our product. At some stage, we were facing the same question: how to normalise and subsequently sign JSON-LD structures that refer to plain JSON? What kind of stable algorithm to use, given the multitude of canonicalisation schemes for JSON that were proposed in the past?

One option was to use a CBOR-based algorithm. We didn't take it as at that time we didn't use CBOR in any other context. (The IPFS-inspired way.)

The option we took was to treat plain JSON as a binary stream. The design rule is: if there is a requirement to look inside the JSON document, it should be JSON-LD'd. Otherwise, it's just a binary stream, as good as any other format, binary or not. However, we don't embed plain JSON, but store it as a separate, addressable blob/file. There are pros and cons, but it suits us for now.

P.S. Ironically, I came across JSON-LD by following Manu Sporny's work in the search for a stable normalised/canonicalised form of JSON 😄

gkellogg commented 4 years ago

@troyronda, regarding the example above, personally I would be very careful about using JSON Literals in the Verifiable Credentials context for the time being. Because most likely it will be used to create a normalised form to sign the credential, followed by storing it in some DLT-like, append-only platform. I suspect that either the spec will evolve, or implementations will eventually align in a non-normative way, that is potentially different from the current one. One may end up with a multitude of verifiable credentials that no JSON-LD implementation can verify because the future normalised form would change. @gkellogg would know better though!

It's pretty unlikely that JSON canonicalization will change substantially. The evolution of JCS upon which the JSON-LD Literal canonicalization is based, as made very few changes over time, so it's possible that at the margins something would be different, it's highly unlikely.

The JSON-LD spec chose to be less precise on some of the margins surrounding key ordering, which is a bit wonky for odd UTF characters.

One option was to use a CBOR-based algorithm. We didn't take it as at that time we didn't use CBOR in any other context. (The IPFS-inspired way.)

The JSON-LD WG is working on a Note for CBOR-LD.

The option we took was to treat plain JSON as a binary stream. The design rule is: if there is a requirement to look inside the JSON document, it should be JSON-LD'd. Otherwise, it's just a binary stream, as good as any other format, binary or not. However, we don't embed plain JSON, but store it as a separate, addressable blob/file. There are pros and cons, but it suits us for now.

You may get different results from different processors, as key order and number representation is not standardized in JSON. Thus, JCS and JSON-LD Literal Canonicalization.

P.S. Ironically, I came across JSON-LD by following Manu Sporny's work in the search for a stable normalised/canonicalised form of JSON 😄

rolsonquadras commented 4 years ago

@kazarena @gkellogg Thanks for the inputs.

For now, we have added the JSON Canonicalization to our fork - https://github.com/trustbloc/json-gold/pull/2.

troyronda commented 3 years ago

@rolsonquadras @kazarena It would be nice to remove the need for a fork.

We currently have this: https://github.com/hyperledger/aries-framework-go/blob/bb1944a72387ca60a49bc1ad909d8edbcb7868f9/go.mod#L41

kazarena commented 3 years ago

@troyronda @rolsonquadras I'd be happy to accept a PR.

troyronda commented 3 years ago

@kazarena thanks.

We are using the cyberphone canonicalizer code (that you mentioned above).

We copied it into an internal folder: https://github.com/trustbloc/json-gold/tree/0.3.0-update/ld/internal/jsoncanonicalizer

I can think of two alternative ways we could make this work without the fork on our end:

  1. Do the same here - copy the canonicalizer code into an internal folder.
  2. Add an option to JsonLdOptions that allows the canonicalizer implementation to be injected.

What do you think?

troyronda commented 3 years ago

Example of option 1: https://github.com/trustbloc/json-gold/pull/2/files Example of option 2: https://github.com/trustbloc/json-gold/pull/3/files

kazarena commented 3 years ago

@troyronda thanks for putting some work into it. I'd be in favour of the first option. We don't have alternative implementations of the canonicaliser, so it's safer to keep the interface unchanged.

kazarena commented 3 years ago

@troyronda if you please submit a PR, I'll approve and release a new minor version.

kdimak commented 3 years ago

@kazarena @troyronda here is a PR: https://github.com/piprate/json-gold/pull/40

kazarena commented 3 years ago

@kdimak super, thank you. Will take a look asap. I will re-run the official test suite and update the results.

kazarena commented 3 years ago

@kdimak @troyronda @rolsonquadras I'm having some trouble running the official toRDF tests for JSON literals. It's not related to your PR, it's an RDF isomorphism problem. I will need to address it first before confirming test results. It shouldn't take too long. Please wait before updating your dependencies.