Closed jyasskin closed 2 years ago
My impression is that security experts advise against canonicalizing structured data in order to hash it,
https://datatracker.ietf.org/doc/html/rfc8785
Cryptographic operations like hashing and signing need the data to be expressed in an invariant format so that the operations are reliably repeatable.
its more accurate to say that any representation that supports non-canonical alternatives is potentially dangerous to hash... you can design serialization formats that have no "non-canonical" representations...
Here is an example from JWKs
https://github.com/denoland/deno/pull/13240
Here is an example from protocol buffers:
Proto3 supports a canonical encoding in JSON, making it easier to share data between systems. The encoding is described on a type-by-type basis in the table below.
https://developers.google.com/protocol-buffers/docs/proto3#json
The problem arises when you cannot rely on a serialization to be in a canonical format (JSON / JSON-LD)... canonicalizing before signing is a long standing security best practice.
https://docs.sigstore.dev/rekor/plugable-types#base-schema
Canonicalize should return a []byte containing the canonicalized contents representing the entry. The canonicalization of contents is important as we should have one record per unique signed object in the transparency log.
Non-canonical forms are not uniquely a problem for maps. Having ambiguous forms could be a problem for Claim Verifiers in the Log Claimant model too, but there Verifiers have the raw Subject and can canonicalize it, or query for substrings/prefixes, etc.
Certainly, JSON is the problem here... having a canonical form is better than having no canonical form, but worse than ONLY having a canonical form.
Other widely used canonical forms in the security space include DER. In fact, it is used by everyone reading this message (at least as of 2022 :)) because every X.509 TLS certificate is canonicalized using it -- before it is hashed and signed.
Certainly, JSON is the problem here... having a canonical form is better than having no canonical form, but worse than ONLY having a canonical form.
Not only JSON. We are defining a hashing of RDF Datasets and not a specific serialization thereof. Depending on the application areas JSON-LD, as an RDF serialization format, may or may not be dominant. Many applications rely on Turtle/TriG, RDFa or even RDF/XML, let alone when a specific RDF Graph or Dataset is stored in a Triple Store with all these serializations as input or output format options.
@jyasskin I have no issues adding a coordination statement into the charter; I think having more eyes on security related specifications is always good. However, I think we would welcome some help on the specific liaison statement to be put into the charter, because it is not immediately clear to me, reading the WG charter, what the statement would be. The concerns of the WebAppSec WG seem to be fairly different from what this WG proposes to do...
B.t.w., @samuelweiler (staff contact of the Web App Sec WG) has already reviewed, and contributed to, an earlier incarnation of this charter proposal. Any proposal from him would also be welcome :-)
@mikewest and @dveditz will probably be helpful in figuring out how the web's security community could be engaged in designing this system. The important thing in my mind is to make sure that this pulls in security expertise from outside the RDF community before it gets to AC review, and horizontal review alone has not always been enough to do that in the past. So perhaps:
Web Application Security Working Group: To ensure that the canonicalization and hashing mechanisms defined in this group have similar security properties to the rest of the web, and to take advantage of lessons learned while designing other canonicalization systems.
@jyasskin thanks. I have moved this to a PR (#96) to speed up progress.
My impression is that security experts advise against canonicalizing structured data in order to hash it, and instead advise hashing the bytes that are transmitted in order to transfer the data. This WG proposes to do the thing that's not advised (with a justification in the explainer), but https://w3c.github.io/rch-wg-charter/#w3c-coordination doesn't mention working with WebAppSec to do it.
Since the problem of hashing structured data has been around a long time in the security space, I don't think it's sufficient to just assume that horizontal review includes security reviewers: they need to be actively engaged in defining and solving the problem.