TODO List #1: before starting the W3C journey

iherman commented 3 years ago

This is just a call for action :-)

One of the first goals is to start the W3C journey; the first step is to present this proposal to the W3C Strategy team for a first reaction, which also includes a charter review on i18n, a11y, privacy, and security. I would expect the first three to go through quickly, there may be discussion on the fourth item. More in parallel with the discussion in the W3C strategy team and advance notice should be issued to the W3C AC, asking for public comments.

There are a number of issues we should try to settle before starting the aforementioned W3C route. I attempt to make a list here:

agree on the proposed length of the WG (the current text is 2 years)
would be nice to have at least one co-chairs settled
there is a clear example of usage through Verifiable Credentials. I believe another example referring to other usages of Linked Data must be added before we start with, hopefully, a publicly expressed need that can be referenced (e.g., signing an ontology for XXX usage, or using an unambiguous reference/identification for a public dataset, etc.
agree on the deliverables (and their titles, though that can be kept flexible)
have a public reference to the paper of Dave and Rachel, which should also refer to a public review
list of the 'other' deliverables
a realistic timeline for a FPWD and for a CR
possible list of external organizations we want to liaise with

I believe its o.k. to start the journey without (1), (6), (8), and possibly (2) settled (although, I believe, having (2) by the time we send an advance notice would be really important). We may rely on the public review (via the advanced notice) to get an example for (3) by contacting, e.g., the semantic web mailing list, but I am worried of the impression that would give that the only reason this work is proposed is due to the Verifiable Credential usage.

Cc @msporny @pchampin @aidhog @dlongley

iherman commented 3 years ago

B.t.w., we probably would have to move this repository to w3c (it makes it easier to solicit reviews, etc). I would do that when we are ready to go.

msporny commented 3 years ago

agree on the proposed length of the WG (the current text is 2 years)

I think 2 years is adequate... ideally, we get done sooner. The only thing that could derail us is an entirely new proposal or a big change to the existing implementations and test suite.

would be nice to have at least one co-chairs settled

Agree... haven't put much thought into that yet. Agree that it should be one from the CCG/DID/VC orbit and another from a very different community, at least.

there is a clear example of usage through Verifiable Credentials. I believe another example referring to other usages of Linked Data must be added before we start with, hopefully, a publicly expressed need that can be referenced (e.g., signing an ontology for XXX usage, or using an unambiguous reference/identification for a public dataset, etc.

We do need a use cases document, which we don't have yet. Here are a few use cases that have surfaced over the past several years:

RDF Dataset Canonicalization Use Cases
- Cryptographic Hashing
- Digital Signatures and Proofs
- Dataset Comparison (to determine if/what information has changed)
- Cache invalidation
- Detecting data tampering/modification
- Debugging digital signature failures (what information changed between sender and receiver?)
Linked Data Proofs and Signatures
- Verifiable Credentials
- Cryptographically establishing source of information
- Verifiable Presentations
- Replay Protection
- Protection of Arbitrary RDF Datasets

agree on the deliverables (and their titles, though that can be kept flexible)

RDF Dataset Normalization (REC)
Linked Data Proofs (REC)
Linked Data Signatures (REC)

have a public reference to the paper of Dave and Rachel, which should also refer to a public review

I'll contact the authors and get them to post to CCG.

list of the 'other' deliverables

Use Cases and Requirements
Ed25519Cryptosuite (NOTE), JOSE Cryptosuite (NOTE), Koblitz Cryptosuite (NOTE)

a realistic timeline for a FPWD and for a CR

RDF Dataset Normalization - FPWD +3 months, CR +12 months
Linked Data Proofs - FPWD +3 months, CR +15 months
Linked Data Signatures - FPWD +3 months, CR +15 months

possible list of external organizations we want to liaise with

All the standard W3C ones - PING, a11y, TAG
All W3C Security WGs
W3C CCG, VCWG, DIDWG
IETF CFRG
DIF SDSWG
Hyperledger Aries

iherman commented 3 years ago

Preamble: I try to be extremely cautious in not promising more than what we want to do here, and be very focused. We know that we will have to be very convincing...

agree on the proposed length of the WG (the current text is 2 years)

I think 2 years is adequate... ideally, we get done sooner.

Sweet dreams :-)

The only thing that could derail us is an entirely new proposal or a big change to the existing implementations and test suite.

there is a clear example of usage through Verifiable Credentials. I believe another example referring to other usages of Linked Data must be added before we start with, hopefully, a publicly expressed need that can be referenced (e.g., signing an ontology for XXX usage, or using an unambiguous reference/identification for a public dataset, etc.

In the proposed version in PR no. 7 what I propose to do is to have:

two very focused use cases (see that PR preview) in the charter
have a longer list in the explainer document (see the version in the PR)

We do need a use cases document, which we don't have yet. Here are a few use cases that have surfaced over the past several years:

RDF Dataset Canonicalization Use Cases

Cryptographic Hashing

I believe that is a fundamental one, see #6.

* Digital Signatures and Proofs
Dataset Comparison (to determine if/what information has changed)

That one I do not really believe in (we have discussed this elsewhere). I do not see the role of canonicalization in that one.

* Cache invalidation
* Detecting data tampering/modification
* Debugging digital signature failures (what information changed between sender and receiver?)
Linked Data Proofs and Signatures

Verifiable Credentials

Cryptographically establishing source of information

Verifiable Presentations

Replay Protection

Protection of Arbitrary RDF Datasets

Could you make a selection and add those, possibly with a PR, to the explainer? Not many, I think 3-4 should be enough.

agree on the deliverables (and their titles, though that can be kept flexible)

RDF Dataset Normalization (REC)

I believe the right terminology (that I also saw elsewhere) is canonicalization...

Linked Data Proofs (REC)

I have problems with this: it is, or suggests, something way too generic and therefore open us up for objections. We do not do (generic) proofs; we "only" define a way of hashing and signing linked data. Better choose a title that reflects this.

Linked Data Signatures (REC)

If the previous document is expressing signatures (which I believe is the case) then the third document only defines the vocabulary to express that signatures in linked data. I think it is worth separating the generic procedure (which is the previous document) from the way it is expressed via a suitable vocabulary. The generic procedure may be usable by itself (e.g., the result of the hash), it does not have to be expressed via a vocabulary...

Hence my proposal for the texts (and the description) in the charter text. I believe they are clearer (though maybe a bit convoluted) in specifying what we want to achieve. Again, we have to be defensive.

have a public reference to the paper of Dave and Rachel, which should also refer to a public review

I'll contact the authors and get them to post to CCG.

That should be fine. Don't forget a reference to the reviews.

list of the 'other' deliverables

Use Cases and Requirements

We have to be careful. If we explicitly refer to a document like that here, then we may get the push back saying "do a working group when you already have a UCR". I am not sure how to handle that...

Ed25519Cryptosuite (NOTE), JOSE Cryptosuite (NOTE), Koblitz Cryptosuite (NOTE)

The current draft is slightly more generic: "A Linked Data cryptosuite registry, containing Linked Data related cryptographic terms, including, although not restricted to, terms used for Linked Data Hash or Signatures." Do you really think we should be that specific in the charter?

a realistic timeline for a FPWD and for a CR

RDF Dataset Normalization - FPWD +3 months, CR +12 months

Linked Data Proofs - FPWD +3 months, CR +15 months

Linked Data Signatures - FPWD +3 months, CR +15 months

These sound about right. Although... the really tough one seems to be the first one. Isn't it possible to have the other two documents released at the same time? Why that 3 months gap?

(Note to myself: probably better to write the timeline in terms of "T+3", etc)

possible list of external organizations we want to liaise with

All the standard W3C ones - PING, a11y, TAG

All W3C Security WGs

There is now a standard boilerplate text in the charter that covers all the horizontals.

W3C CCG, VCWG, DIDWG

yep, they are all there

IETF CFRG

DIF SDSWG

Hyperledger Aries

I would need a precise link and a text for those (or a PR from you with those).

Before getting into PR-s, it would be nice to agree (or decide to disagree:-) on #7

msporny commented 3 years ago

I have now done a complete pass and editorial suggestions in PRs #10, #11, #12, #13, and #14.

There are two high-level take-aways for my suggestions:

Linked Data Hashing is going to confuse people... I have eliminated the use of "Normalization"... we mean Canonicalization, that is the correct Computer Science term and we should not deviate from that word.
We should combine Linked Data Proofs and Linked Data Signatures into one specification called "Linked Data Security". I made the suggestion in the PRs, and if we're ok with that, then I'll make the changes to the CCG specifications.
Stating that the work is only about Signatures is also going to confuse people, because it's not just about Signatures. We should call the WG "Linked Data Security", and name one of the specs "Linked Data Security", and the registry should be the "Linked Data Security Registry".

Again, these are suggestions (fairly heavy suggestions -- I do think it's the right path), but would love to hear thoughts.

msporny commented 3 years ago

In the proposed version in PR no. 7 what I propose to do is to have:

two very focused use cases (see that PR preview) in the charter

have a longer list in the explainer document (see the version in the PR)

Agree, I will try to add some more to the explainer. We do want people to be aware that we are focused now, but the work may expand in the future.

Dataset Comparison (to determine if/what information has changed)

That one I do not really believe in (we have discussed this elsewhere). I do not see the role of canonicalization in that one.

This is a big use case for us, @iherman... we use it all the time to debug broken digital signatures. I'm fine w/ not putting a focus on it, but it is an important use case.

Could you make a selection and add those, possibly with a PR, to the explainer? Not many, I think 3-4 should be enough.

Yes, I can do that... also, a few more came up on the CCG mailing list yesterday (based on work that Alan Karp did in 2004).

I believe the right terminology (that I also saw elsewhere) is canonicalization...

Agreed, I am currently updating all the things to match the "Canonicalization" terminology.

I have problems with this: it is, or suggests, something way too generic and therefore open us up for objections. We do not do (generic) proofs; we "only" define a way of hashing and signing linked data. Better choose a title that reflects this.

I updated to "Linked Data Security" and clearly outlined what would be in the specification (as focused and concrete).

I'll contact the authors and get them to post to CCG.

That should be fine. Don't forget a reference to the reviews.

Done, this ball is rolling.

Use Cases and Requirements

We have to be careful. If we explicitly refer to a document like that here, then we may get the push back saying "do a working group when you already have a UCR". I am not sure how to handle that...

I can quickly put a Use Cases document together if that happens. For now, I think the Explainer is good enough. I'll fill more use cases out there.

Ed25519Cryptosuite (NOTE), JOSE Cryptosuite (NOTE), Koblitz Cryptosuite (NOTE)

The current draft is slightly more generic: "A Linked Data cryptosuite registry, containing Linked Data related cryptographic terms, including, although not restricted to, terms used for Linked Data Hash or Signatures." Do you really think we should be that specific in the charter?

I took your text and modified it slightly to not put the possibility of those NOTEs out of scope.

These sound about right. Although... the really tough one seems to be the first one. Isn't it possible to have the other two documents released at the same time? Why that 3 months gap?

Based on the reality that I've experienced throughout the last decade... these things tend to fall on a very small number of overworked people, so I'm trying to be kind to them. :)

(Note to myself: probably better to write the timeline in terms of "T+3", etc)

I updated these values in my PRs... used "WG-START + 3 months".

IETF CFRG

DIF SDSWG

Hyperledger Aries

I would need a precise link and a text for those (or a PR from you with those).

Done in aba5c9f8e028952f27da565636472d486e8d0ac6.

Before getting into PR-s, it would be nice to agree (or decide to disagree:-) on #7

Here's the proposal:

https://pr-preview.s3.amazonaws.com/iherman/ld-signatures-charter/pull/13.html#timeline

What do you think?

aidhog commented 3 years ago

Dataset Comparison (to determine if/what information has changed)

That one I do not really believe in (we have discussed this elsewhere). I do not see the role of canonicalization in that one.

Also a bit unsure about this part. I guess it might need to be worded carefully to avoid the impression that we can solve this in the general case.

Canonical labelling could be used, for example, to see which graphs in a dataset changed.

It could also be used to build hash structures like Merkle trees over large RDF graphs.

More generally, you could use canonical labelling to see which (pre-defined) partitions of a graph changed by using the labels to hash each of those partitions and compare the hash. But it becomes complicated if the partitions depend on the labels; in other words, I think the partitions would have to be well-defined without using blank nodes. Also it may not be very helpful to understand what actually changed within each partition (just whether the partition changed or not).

iherman commented 3 years ago

The text proposed in the explainer (after @msporny's changes) is more specific: the hash can indeed be used to see if there is a change on the dataset. What it does is to say whether there has been a change or not, not what exactly the change is. The former is absolutely o.k., the latter is what I suspect is a very different problem.

iherman commented 3 years ago

I propose to close this issue. The cumulative changes via @msporny's PRs seem to cover the points I had...

msporny commented 3 years ago

@aidhog wrote:

But it becomes complicated if the partitions depend on the labels; in other words, I think the partitions would have to be well-defined without using blank nodes.

Yes, agree with everything you said.

The use case I added had more to do with "it's a tool that a human could use to help them narrow down on changes", where they might have to use their brain to reason about the changes (rather than an automated reasoner). I'll try to think more about how to reword that use case to be more accurate.

w3c / rch-wg-charter

TODO List #1: before starting the W3C journey #5