Closed iherman closed 3 years ago
B.t.w., we probably would have to move this repository to w3c (it makes it easier to solicit reviews, etc). I would do that when we are ready to go.
agree on the proposed length of the WG (the current text is 2 years)
I think 2 years is adequate... ideally, we get done sooner. The only thing that could derail us is an entirely new proposal or a big change to the existing implementations and test suite.
would be nice to have at least one co-chairs settled
Agree... haven't put much thought into that yet. Agree that it should be one from the CCG/DID/VC orbit and another from a very different community, at least.
there is a clear example of usage through Verifiable Credentials. I believe another example referring to other usages of Linked Data must be added before we start with, hopefully, a publicly expressed need that can be referenced (e.g., signing an ontology for XXX usage, or using an unambiguous reference/identification for a public dataset, etc.
We do need a use cases document, which we don't have yet. Here are a few use cases that have surfaced over the past several years:
agree on the deliverables (and their titles, though that can be kept flexible)
have a public reference to the paper of Dave and Rachel, which should also refer to a public review
I'll contact the authors and get them to post to CCG.
list of the 'other' deliverables
a realistic timeline for a FPWD and for a CR
possible list of external organizations we want to liaise with
Preamble: I try to be extremely cautious in not promising more than what we want to do here, and be very focused. We know that we will have to be very convincing...
agree on the proposed length of the WG (the current text is 2 years)
I think 2 years is adequate... ideally, we get done sooner.
Sweet dreams :-)
The only thing that could derail us is an entirely new proposal or a big change to the existing implementations and test suite.
there is a clear example of usage through Verifiable Credentials. I believe another example referring to other usages of Linked Data must be added before we start with, hopefully, a publicly expressed need that can be referenced (e.g., signing an ontology for XXX usage, or using an unambiguous reference/identification for a public dataset, etc.
In the proposed version in PR no. 7 what I propose to do is to have:
We do need a use cases document, which we don't have yet. Here are a few use cases that have surfaced over the past several years:
RDF Dataset Canonicalization Use Cases
- Cryptographic Hashing
I believe that is a fundamental one, see #6.
* Digital Signatures and Proofs
- Dataset Comparison (to determine if/what information has changed)
That one I do not really believe in (we have discussed this elsewhere). I do not see the role of canonicalization in that one.
* Cache invalidation * Detecting data tampering/modification * Debugging digital signature failures (what information changed between sender and receiver?)
Linked Data Proofs and Signatures
Verifiable Credentials
Cryptographically establishing source of information
Verifiable Presentations
Replay Protection
Protection of Arbitrary RDF Datasets
Could you make a selection and add those, possibly with a PR, to the explainer? Not many, I think 3-4 should be enough.
agree on the deliverables (and their titles, though that can be kept flexible)
- RDF Dataset Normalization (REC)
I believe the right terminology (that I also saw elsewhere) is canonicalization...
- Linked Data Proofs (REC)
I have problems with this: it is, or suggests, something way too generic and therefore open us up for objections. We do not do (generic) proofs; we "only" define a way of hashing and signing linked data. Better choose a title that reflects this.
- Linked Data Signatures (REC)
If the previous document is expressing signatures (which I believe is the case) then the third document only defines the vocabulary to express that signatures in linked data. I think it is worth separating the generic procedure (which is the previous document) from the way it is expressed via a suitable vocabulary. The generic procedure may be usable by itself (e.g., the result of the hash), it does not have to be expressed via a vocabulary...
Hence my proposal for the texts (and the description) in the charter text. I believe they are clearer (though maybe a bit convoluted) in specifying what we want to achieve. Again, we have to be defensive.
have a public reference to the paper of Dave and Rachel, which should also refer to a public review
I'll contact the authors and get them to post to CCG.
That should be fine. Don't forget a reference to the reviews.
list of the 'other' deliverables
- Use Cases and Requirements
We have to be careful. If we explicitly refer to a document like that here, then we may get the push back saying "do a working group when you already have a UCR". I am not sure how to handle that...
- Ed25519Cryptosuite (NOTE), JOSE Cryptosuite (NOTE), Koblitz Cryptosuite (NOTE)
The current draft is slightly more generic: "A Linked Data cryptosuite registry, containing Linked Data related cryptographic terms, including, although not restricted to, terms used for Linked Data Hash or Signatures." Do you really think we should be that specific in the charter?
a realistic timeline for a FPWD and for a CR
- RDF Dataset Normalization - FPWD +3 months, CR +12 months
- Linked Data Proofs - FPWD +3 months, CR +15 months
- Linked Data Signatures - FPWD +3 months, CR +15 months
These sound about right. Although... the really tough one seems to be the first one. Isn't it possible to have the other two documents released at the same time? Why that 3 months gap?
(Note to myself: probably better to write the timeline in terms of "T+3", etc)
possible list of external organizations we want to liaise with
- All the standard W3C ones - PING, a11y, TAG
- All W3C Security WGs
There is now a standard boilerplate text in the charter that covers all the horizontals.
- W3C CCG, VCWG, DIDWG
yep, they are all there
- IETF CFRG
- DIF SDSWG
- Hyperledger Aries
I would need a precise link and a text for those (or a PR from you with those).
Before getting into PR-s, it would be nice to agree (or decide to disagree:-) on #7
I have now done a complete pass and editorial suggestions in PRs #10, #11, #12, #13, and #14.
There are two high-level take-aways for my suggestions:
Again, these are suggestions (fairly heavy suggestions -- I do think it's the right path), but would love to hear thoughts.
In the proposed version in PR no. 7 what I propose to do is to have:
- two very focused use cases (see that PR preview) in the charter
- have a longer list in the explainer document (see the version in the PR)
Agree, I will try to add some more to the explainer. We do want people to be aware that we are focused now, but the work may expand in the future.
- Dataset Comparison (to determine if/what information has changed)
That one I do not really believe in (we have discussed this elsewhere). I do not see the role of canonicalization in that one.
This is a big use case for us, @iherman... we use it all the time to debug broken digital signatures. I'm fine w/ not putting a focus on it, but it is an important use case.
Could you make a selection and add those, possibly with a PR, to the explainer? Not many, I think 3-4 should be enough.
Yes, I can do that... also, a few more came up on the CCG mailing list yesterday (based on work that Alan Karp did in 2004).
I believe the right terminology (that I also saw elsewhere) is canonicalization...
Agreed, I am currently updating all the things to match the "Canonicalization" terminology.
I have problems with this: it is, or suggests, something way too generic and therefore open us up for objections. We do not do (generic) proofs; we "only" define a way of hashing and signing linked data. Better choose a title that reflects this.
I updated to "Linked Data Security" and clearly outlined what would be in the specification (as focused and concrete).
I'll contact the authors and get them to post to CCG.
That should be fine. Don't forget a reference to the reviews.
Done, this ball is rolling.
- Use Cases and Requirements
We have to be careful. If we explicitly refer to a document like that here, then we may get the push back saying "do a working group when you already have a UCR". I am not sure how to handle that...
I can quickly put a Use Cases document together if that happens. For now, I think the Explainer is good enough. I'll fill more use cases out there.
- Ed25519Cryptosuite (NOTE), JOSE Cryptosuite (NOTE), Koblitz Cryptosuite (NOTE)
The current draft is slightly more generic: "A Linked Data cryptosuite registry, containing Linked Data related cryptographic terms, including, although not restricted to, terms used for Linked Data Hash or Signatures." Do you really think we should be that specific in the charter?
I took your text and modified it slightly to not put the possibility of those NOTEs out of scope.
These sound about right. Although... the really tough one seems to be the first one. Isn't it possible to have the other two documents released at the same time? Why that 3 months gap?
Based on the reality that I've experienced throughout the last decade... these things tend to fall on a very small number of overworked people, so I'm trying to be kind to them. :)
(Note to myself: probably better to write the timeline in terms of "T+3", etc)
I updated these values in my PRs... used "WG-START + 3 months".
- IETF CFRG
- DIF SDSWG
- Hyperledger Aries
I would need a precise link and a text for those (or a PR from you with those).
Done in aba5c9f8e028952f27da565636472d486e8d0ac6.
Before getting into PR-s, it would be nice to agree (or decide to disagree:-) on #7
Here's the proposal:
https://pr-preview.s3.amazonaws.com/iherman/ld-signatures-charter/pull/13.html#timeline
What do you think?
- Dataset Comparison (to determine if/what information has changed)
That one I do not really believe in (we have discussed this elsewhere). I do not see the role of canonicalization in that one.
Also a bit unsure about this part. I guess it might need to be worded carefully to avoid the impression that we can solve this in the general case.
Canonical labelling could be used, for example, to see which graphs in a dataset changed.
It could also be used to build hash structures like Merkle trees over large RDF graphs.
More generally, you could use canonical labelling to see which (pre-defined) partitions of a graph changed by using the labels to hash each of those partitions and compare the hash. But it becomes complicated if the partitions depend on the labels; in other words, I think the partitions would have to be well-defined without using blank nodes. Also it may not be very helpful to understand what actually changed within each partition (just whether the partition changed or not).
The text proposed in the explainer (after @msporny's changes) is more specific: the hash can indeed be used to see if there is a change on the dataset. What it does is to say whether there has been a change or not, not what exactly the change is. The former is absolutely o.k., the latter is what I suspect is a very different problem.
I propose to close this issue. The cumulative changes via @msporny's PRs seem to cover the points I had...
@aidhog wrote:
But it becomes complicated if the partitions depend on the labels; in other words, I think the partitions would have to be well-defined without using blank nodes.
Yes, agree with everything you said.
The use case I added had more to do with "it's a tool that a human could use to help them narrow down on changes", where they might have to use their brain to reason about the changes (rather than an automated reasoner). I'll try to think more about how to reword that use case to be more accurate.
This is just a call for action :-)
One of the first goals is to start the W3C journey; the first step is to present this proposal to the W3C Strategy team for a first reaction, which also includes a charter review on i18n, a11y, privacy, and security. I would expect the first three to go through quickly, there may be discussion on the fourth item. More in parallel with the discussion in the W3C strategy team and advance notice should be issued to the W3C AC, asking for public comments.
There are a number of issues we should try to settle before starting the aforementioned W3C route. I attempt to make a list here:
I believe its o.k. to start the journey without (1), (6), (8), and possibly (2) settled (although, I believe, having (2) by the time we send an advance notice would be really important). We may rely on the public review (via the advanced notice) to get an example for (3) by contacting, e.g., the semantic web mailing list, but I am worried of the impression that would give that the only reason this work is proposed is due to the Verifiable Credential usage.
Cc @msporny @pchampin @aidhog @dlongley