Coordinate with WebAppSec

jyasskin commented 2 years ago

My impression is that security experts advise against canonicalizing structured data in order to hash it, and instead advise hashing the bytes that are transmitted in order to transfer the data. This WG proposes to do the thing that's not advised (with a justification in the explainer), but https://w3c.github.io/rch-wg-charter/#w3c-coordination doesn't mention working with WebAppSec to do it.

Since the problem of hashing structured data has been around a long time in the security space, I don't think it's sufficient to just assume that horizontal review includes security reviewers: they need to be actively engaged in defining and solving the problem.

OR13 commented 2 years ago

My impression is that security experts advise against canonicalizing structured data in order to hash it,

https://datatracker.ietf.org/doc/html/rfc8785

Cryptographic operations like hashing and signing need the data to be expressed in an invariant format so that the operations are reliably repeatable.

its more accurate to say that any representation that supports non-canonical alternatives is potentially dangerous to hash... you can design serialization formats that have no "non-canonical" representations...

Here is an example from JWKs

https://github.com/denoland/deno/pull/13240

Here is an example from protocol buffers:

Proto3 supports a canonical encoding in JSON, making it easier to share data between systems. The encoding is described on a type-by-type basis in the table below.

https://developers.google.com/protocol-buffers/docs/proto3#json

The problem arises when you cannot rely on a serialization to be in a canonical format (JSON / JSON-LD)... canonicalizing before signing is a long standing security best practice.

https://docs.sigstore.dev/rekor/plugable-types#base-schema

Canonicalize should return a []byte containing the canonicalized contents representing the entry. The canonicalization of contents is important as we should have one record per unique signed object in the transparency log.

google/trillian

Non-canonical forms are not uniquely a problem for maps. Having ambiguous forms could be a problem for Claim Verifiers in the Log Claimant model too, but there Verifiers have the raw Subject and can canonicalize it, or query for substrings/prefixes, etc.

Certainly, JSON is the problem here... having a canonical form is better than having no canonical form, but worse than ONLY having a canonical form.

dlongley commented 2 years ago

Other widely used canonical forms in the security space include DER. In fact, it is used by everyone reading this message (at least as of 2022 :)) because every X.509 TLS certificate is canonicalized using it -- before it is hashed and signed.

iherman commented 2 years ago

Certainly, JSON is the problem here... having a canonical form is better than having no canonical form, but worse than ONLY having a canonical form.

Not only JSON. We are defining a hashing of RDF Datasets and not a specific serialization thereof. Depending on the application areas JSON-LD, as an RDF serialization format, may or may not be dominant. Many applications rely on Turtle/TriG, RDFa or even RDF/XML, let alone when a specific RDF Graph or Dataset is stored in a Triple Store with all these serializations as input or output format options.

iherman commented 2 years ago

@jyasskin I have no issues adding a coordination statement into the charter; I think having more eyes on security related specifications is always good. However, I think we would welcome some help on the specific liaison statement to be put into the charter, because it is not immediately clear to me, reading the WG charter, what the statement would be. The concerns of the WebAppSec WG seem to be fairly different from what this WG proposes to do...

B.t.w., @samuelweiler (staff contact of the Web App Sec WG) has already reviewed, and contributed to, an earlier incarnation of this charter proposal. Any proposal from him would also be welcome :-)

jyasskin commented 2 years ago

@mikewest and @dveditz will probably be helpful in figuring out how the web's security community could be engaged in designing this system. The important thing in my mind is to make sure that this pulls in security expertise from outside the RDF community before it gets to AC review, and horizontal review alone has not always been enough to do that in the past. So perhaps:

Web Application Security Working Group: To ensure that the canonicalization and hashing mechanisms defined in this group have similar security properties to the rest of the web, and to take advantage of lessons learned while designing other canonicalization systems.

iherman commented 2 years ago

@jyasskin thanks. I have moved this to a PR (#96) to speed up progress.

w3c / rch-wg-charter

Coordinate with WebAppSec #94