w3c / rch-wg-charter

Charter proposal for an “RDF Dataset Canonicalization and Hash Working Group”
https://w3c.github.io/rch-wg-charter/
Other
12 stars 7 forks source link

Optical/RF data carriers #29

Closed philarcher closed 3 years ago

philarcher commented 3 years ago

I wonder whether there is scope for a non-normative NOTE in this WG about encoding of small RDF Datasets in optical and radio frequency data carriers, such as QR, Data Matrix and NFC tags. Orie has done some excellent work on creating CBOR-LD-based VCs that could meet the current stellar use case of COVID-status certificates but that same approach could be applied to other certificates. At GS1 we'd be thinking of things like certified organic, certified gluten-free etc. but the potential use cases are legion of course.

In a limited capacity data carrier, you don't have room for niceties and flexibility so a standardized approach is going to be essential

iherman commented 3 years ago

I am not sure how these things work, so forgive me if the question is stupid. But my fundamental question would be: has graph canonicalization, ie, signature any relevance for such encoding? If yes, then this WG gives you a fundamental building block and putting such a use case on paper should be o.k. But if the answer is no, then I do not think so...

philarcher commented 3 years ago

The relevance is that QR codes and their cousins have a very limited capacity (if you want to keep the symbol small and keep away from silly ideas like flashing sequences). This presents a problem when you want to convey verifiable information between devices as it's very easy to exceed the capacity of the data carrier. How do you encode a VC, compete with public key, in a QR code? Answer: with great care! Since a VC is an RDF Dataset, albeit a small one, this seems relevant. Not as a core spec, no, but relevant if you want to transmit a VC in a very constrained environment. COVID passports are an obvious current use case.

OR13 commented 3 years ago

@iherman my concern is that how "signatures" are represented limits where they can be used... and how they are represented, leads to encoding questions.

For example, https://www.iana.org/assignments/jose/jose.xhtml#web-signature-encryption-algorithms

DEF is only registered for JWE not JWS... and we have seen some real problems with JWS / JWT size due to base64url and JSON...

So my concern is essentially, are we going to be able to comment on encoding signatures in the WG, or are we only commenting on canonicalizing the input to signatures.

dlongley commented 3 years ago

I think an important part of LDS is talking about how the same signed data can be represented a number of different ways because the canonicalization primitive enables that to happen without invalidating the signature.

Generally speaking, any RDF graph/dataset signed using LDS should be expressible as JSON-LD and, thus, also expressible as CBOR-LD, a format that uses semantic compression to greatly reduce payload size. This is particularly helpful for both the vaccine VC and other use cases @philarcher is mentioning. It's helpful for any "VC/LD-proof document expressed via a low capacity, physical code".

OR13 commented 3 years ago

@dlongley is "how the same signed data can be represented a number of different ways because the canonicalization primitive enables that to happen without invalidating the signature" part of the charter?

My reading of it as it stands today is that it would be in scope.

I would love for the ability to at least formalize the capabilities that canonicalization provides in the matter, even if particular representations are declared out of scope.

The wording of the out of scope section appears to focus on digital signatures, and seems to imply that we can design both the inputs (built from canonicalization) and outputs (represented as JSON-LD / other encodings?)

If the charter permits defintion of thee signature output format, then I would interpret RDF / JSON-LD / CBOR-LD as all equal representations for that output, but CBOR-LD is not a standard like JSON-LD... so we would be forbidden from normatively referencing it... we could however informatively reference it, and rely on the JSON-LD representation for LD Proofs correct?

dlongley commented 3 years ago

@OR13,

The wording of the out of scope section appears to focus on digital signatures, and seems to imply that we can design both the inputs (built from canonicalization) and outputs (represented as JSON-LD / other encodings?)

Yes, I think we can standardize both of these things. What's out of scope are the crypto/math primitives themselves. We should not be coming up with new elliptic curves to use or lattice-based crypto.

If the charter permits defintion of thee signature output format, then I would interpret RDF / JSON-LD / CBOR-LD as all equal representations for that output, but CBOR-LD is not a standard like JSON-LD... so we would be forbidden from normatively referencing it... we could however informatively reference it, and rely on the JSON-LD representation for LD Proofs correct?

Yes, that's my read.

iherman commented 3 years ago

My mental model of the work to be done may be summarized in these very distinct steps:

  1. We need a specification to create a signature of an RDF Graph/Dataset (which includes the canonicalization algorithm)
  2. We need a way to express that signature as a separate RDF "Signature Graph"
    • note that the signature Graph should not only include the crypto signature itself, but all the parameters that uniquely identify the underlying process: what hashing functions was used in step (1), what crypto tools are used to create the signature itself, etc.
  3. The Signature Graph must be serialized in some RDF serialization

The fundamental distinction, from the point of view of this discussion, is that (1) and (2) above, and only those, are normative, ie, subject of a W3C Recommendation. Whether JSON-LD, CBOR-LD, TriG, or (God forbid!) RDF/XML is used for step (3) is not relevant for the normative work. Neither are which crypto and/or hash functions are used in (1), and represented in (2), only the framework to represent them is. This is exactly the same distinction as in DID, where the (normative) type property is used to identify a verification method but the possible values of type are in the registry and not in the standard.

I.e., in my view, all of the problems mentioned in this thread may be the subject of WG Notes but Notes only. For example, I can very well imagine that the choice of crypto/hash functions may affect the size of a Signature Graph. Similarly, the choice of an RDF serialization syntax may be relevant for a specific applications: CBOR-LD may well be o.k. for some, whereas others would require TriG. These analysis may be relevant for what @philarcher asks for: yes, such analysis (and possibly others) may be o.k. as a W3C WG Note. But those are not normative works.

My practical conclusions:


There were some discussions, see also #19, to do more in this WG, essentially standardizing the choices of crypto functions. I would be genuinely worried to do that in this round of the WG and I'v always pushed back on that idea. I think doing a proper job with (1) and (2) above will be enough, and we should push such further normative work to a later phase.

OR13 commented 3 years ago

@iherman since DID Core struggled with this sanity... are you essentially saying that the ld security wg abstract data model is isomorphic to RDF, and representations are concrete serializations of abstract RDF, like RDF/XML or JSON-LD?

Can we just say RDF is the ADM / graph format from which other serializations will be produced?

iherman commented 3 years ago

@iherman since DID Core struggled with this sanity... are you essentially saying that the ld security wg abstract data model is isomorphic to RDF, and representations are concrete serializations of abstract RDF, like RDF/XML or JSON-LD?

The LD Security work (in this Working Group at least) works on, to use the original terminology, the RDF abstract syntax that you can also refer to as the RDF data model. There is no "isomorphism" here; it is THE thing on which we would work.

There are a number of 'serializations' of the RDF Model. Some (most) of them are W3C Standards (RDFa, RDF/XML, Turtle/TriG, JSON-LD, etc.) and some of them may not be standards (yet?) like CBOR-LD.

Can we just say RDF is the ADM / graph format from which other serializations will be produced?

The "other serialization" is not the good terminology. RDF abstract model is not a serialization, ie, there is not "other" here.

A conceptual analogy I used when I used to make tutorials about RDF is that it is like numbers and numerals. Numbers are abstract mathematical concepts, numerals are ways of expressing those concepts. Mathematical theories are defined on, and expressed through numbers and not numerals, just like, say, canonicalization of an RDF graph is expressed on the RDF ADM.

OR13 commented 3 years ago

@iherman thank you for answering my question. We are excited to participate in this work.

iherman commented 3 years ago

@philarcher

  • It may be worth adding a few lines to the explainer document for future (albeit non-binding) reference. A PR would be welcome…

Do you think you can give a few lines as a PR for the explainer document? Good to gather those use cases in one place, as you well know...

That put aside, is it o.k. to close this issue?

philarcher commented 3 years ago

Thanks everyone for giving this consideration.

PR #55 addresses this issue. Whether the PR is accepted or not, I'm happy to close the issue.