Rework content integrity section based on @jyasskin's feedback.

msporny commented 1 month ago

This PR is an attempt to partially address issue #1348 by reworking the content integrity section in the specification based on feedback from @jyasskin.

NOTE: It contains content that was objected to by @selfissued in PR #1464 and is, thus, not mergeable in its current state. This PR will be used to drive the discussion and come to consensus on what can be merged by consensus of the group.

Specifically:

5.4 Integrity of Related Resources

[x] Editorial clean-up and simplification of normative statements to match other sections. Fixed in b3ac23cafe92eb92980b8c2d4612d0cdd9908b96 and e0c09880ba284d9da44d23b5c151fdf4bf113f7b and re-applied in 692947e2febafad5b5e9ae55bf181d5ab02e245d.

Preview | Diff

iherman commented 1 month ago

The issue was discussed in a meeting on 2024-05-08

no resolutions were taken

View the transcript

#### 2.1. Rework content integrity section based on @jyasskin's feedback. (pr vc-data-model#1484) _See github pull request [vc-data-model#1484](https://github.com/w3c/vc-data-model/pull/1484)._ **Manu Sporny:** The content integrity section has a number of objections on it. I presume we'll talk about those at some point. The rest of the specs are doing their normal thing, that's it from me. > *Dmitri Zagidulin:* do you have links to the content integrity objections? _See github pull request [vc-data-model#1484](https://github.com/w3c/vc-data-model/pull/1484)._ **Brent Zundel:** We have some time to look at the content integrity PR. … we have some time to look at that PR.

iherman commented 1 month ago

The issue was discussed in a meeting on 2024-05-08

no resolutions were taken

View the transcript

#### 2.1. Rework content integrity section based on @jyasskin's feedback. (pr vc-data-model#1484) _See github pull request [vc-data-model#1484](https://github.com/w3c/vc-data-model/pull/1484)._ **Manu Sporny:** The content integrity section has a number of objections on it. I presume we'll talk about those at some point. The rest of the specs are doing their normal thing, that's it from me. > *Dmitri Zagidulin:* do you have links to the content integrity objections? _See github pull request [vc-data-model#1484](https://github.com/w3c/vc-data-model/pull/1484)._ **Brent Zundel:** We have some time to look at the content integrity PR. … we have some time to look at that PR.

iherman commented 1 month ago

The issue was discussed in a meeting on 2024-05-08

no resolutions were taken

View the transcript

#### 2.3. Rework content integrity section based on @jyasskin's feedback. (pr vc-data-model#1484) _See github pull request [vc-data-model#1484](https://github.com/w3c/vc-data-model/pull/1484)._ **Brent Zundel:** This PR has had a number of comments on it, number of requests for changes, Manu could you talk us through it? … Folks that would like to see changes, please do so. **Manu Sporny:** This PR started out as a request from Jeffrey Yaskin to rewrite the content integrity section so that it matched the pattern of the other sections. This section was originally written by Mike Prorock, it was the first attempt at it and it went through a lot of churn and needed clean up. … There were a number of things agreed to -- as far as "please raise an issue noting X and Y" in the content integrity section. There was not agreement on the digest format we would use, but there was agreement that it would be between `digestSRI` and `digestMultibase` or both and leave it to implementers to decide on which ones to implement, if not both of them. … This section was largely rewritten editorially, but there are objections to the way it was rewritten because we now mention both `digestSRI` and `digestMultibase` in the spec; one argument is that was always true, there was an issue marker in the spec that said we would rely on implementers to pick. … The counterargument to that is that we never agreed to that but that's challenged as well. Largely this is an editorial rewrite of the section and the thing that's under contention is that `digestMultibase` is being specified more clearly based on the issue marker. … I would be interested in hearing Mike's concerns here and Jeffrey Yaskin has also jumped in with good questions we should answer and while we're reworking the section we should address those. 80% are questions with answers, 20% we might want to discuss. **Michael Jones:** Obviously, I'm the one who had made the point that we shouldn't be normatively adding multibase to the VCDM spec since we have been careful to not include it before. I understand that there is an issue comment, but I'm not willing to elevate multibase beyond that issue comment. **David Chadwick:** This new text should have removed the note, suggest removing that bit, rework editorial, and discuss later. **Manu Sporny:** Yes, I think that the new text, actually, if you look at it carefully, the changes have been made, it's clearly addressing Jeffrey's comments. I think the new text on multibase isn't replacing anything. We should have struck out the note. Then we can put that into a new PR and just work on that PR alone. **Brent Zundel:** ok, so smaller more controversial bit to its own PR. **Dmitri Zagidulin:** I'm curious, with regards to `digestMultibase`, since Mike's objection seems to be primarily to the base encoding part, perhaps not the multihash aspect of it, I'm wondering if there's still time or opportunity to change direction and require the base to be base64url and have it be a digest multihash, which provides significant size benefits over the digest SRI approach. … Would Manu and other proponents consider pivoting and leaving the multihash aspect. **Manu Sporny:** Sure. I think there's a misunderstanding about what's in the spec today. Today we normatively reference DI and VC-JOSE-COSE -- because this group decided that because we're publishing those we will normatively reference those from VCDM. As a result, we can pull in those normative things from those specs. If the concern is not defining them in VCDM, that's not happening. … We're using it from the other specs. The more confusing thing is that we're taking the definitions from DI -- and moved them into the controller document. The thing with the normative definition for multibase is in the controller document. Now we'd have to ref the controller document from the VC data model. In the same way we ref SRI. … We can't just ref SRI, we have to specify a field and the format of that field and we have language around that. We have to define it somewhere. We have to figure out where to define it -- we could define it in controller document, we could define it in VC-JOSE-COSE, we could define it in DI. I don't have a strong feeling where it goes. … But to use that feature we have to talk about it in the VCDM. I don't quite understand what Mike's objection -- are you objecting to where it's defined (it must be defined somewhere, but we have to normatively reference it to use it). I don't really care where it's defined but it has to be somewhere, the easiest place is in the VCDM, but we can put it in DI or somewhere else. … We could throw it all in the controller document. **Brent Zundel:** Just to summarize, currently the VCDM normatively references `digestSRI` and defines a profile of how we're using it? **Manu Sporny:** No, it references SRI, which is a web browser thing. We use a tiny piece of SRI but we can't just do that. We have to define what the value is in the `@context`, its range, how it's encoded, etc. SRI doesn't do that, it's targeted at browsers. … We chose to do that work in VCDM, it's a couple of sentences, it's simple. But we have to do that for `digestMultibase` as well, but it's already defined in DI and we can just ref it. **Brent Zundel:** That was helpful. **Michael Jones:** As Dmitri points out, the problem with digestMultibase is its dependent upon multibase, where inexplicably it defines something like 26 encodings for the same data, which is a travesty for interop. … my objections are not about where this is defined, it's to using it at all, because it hurts interop. I'm on public record about that. … The multibase encoding is a failure to make a choice, we should steer clear of it. > *Dmitri Zagidulin:* @dlongley - but if we're gonna use 'u' prefix only, why don't we just specify that we're using u, and save the extra character. **Dave Longley:** 2nd Dmitri's proposal to resolve this, to say that we only support the base64url prefix, which is the multibase prefix for base64url, nopad encoded, if we do that, then that gets us past the objection, then we get data savings, and so on. **Brent Zundel:** proposal on the table is we specify which base and do that in this section of VCDM. Mike, would that address your objections? **Dmitri Zagidulin:** I like Dave's suggestion, but if we're fixing that we're just fixing base64url, and drop the 'u' and save ourselves an extra character. **Michael Jones:** Well, Dmitri's suggestion is rational, threw me off. In response to brent's suggestion, I'm unwilling to define any form of digestMultibase in our core spec, I understand the irony, which defines this in controller document, and I do appreciate we got to compromise to DI where we don't normatively ref 26 choices, but we decide w/ what Dmitri said, what choices we make, I do not know whether a form of digestMultibase is defined in controller document, if we're going to define that w/ base64url encoding I'd do it there to keep everything together. The other question, I do support defining the fields necessary to use digestSRI in our spec in the Content Integrity section. Finally, I haven't read Jeffrey's comments, read them w/ eye, is he asking for Multibase to be specified. or is he asking unrelated editorial things? **Dave Longley:** One of the advantages of using multibase is it introduces layers and a separation of concerns, so you can compress any multibase data regardless of whichever data is done, if you want to compress using CBOR-LD, it can do that sort of compression, or use tech that is not a VC, different choices on encoding, here in our group we can say base64url, no one has to go rewrite new tech, type is multibase, no new tech needs to be written to that particular field, that other tech continues to work, we get benefit that there is interop around that prefix, we get best of both worlds if we make that choice. > *Dmitri Zagidulin:* @dlongley - since that's driven off of an RDF data type, can't we define a base64url data type, and have processors just work off that/. **Dmitri Zagidulin:** Dave's point about CBOR-LD processors makes a lot of sense, since signal processors are working off of, RDF type multibase, why dont' we define RDF type for base64url, same logic, same compression can be invoked? > *Dmitri Zagidulin:* but there's like only 3-4 encodings realistically (base64url, base32lower, base32upper... like what else are you gonna use). **Dave Longley:** What we're suggesting now is a new RDF type for every encoding type, or we can re-use work out there, already implementations using that... if the world had decided to take a different approach, and communities were using RDF, then that might have worked, but there are separations of concerns, trying to re-use technologies, if we re-invent yet another way to do it, we're less interoperable. … simplest thing to do is take implementations already out there, will work w/ other tooling, we should invent fewer things, we should profile it down, it addresses Mike's concern, they don't have to use it either and will just work w/ VCs w/ one prefix. **Michael Jones:** I guess I disagree with the characterization, it's not inventing something, it's something that's been around for a decade and a half. If we need context and vocabulary term, let's create it. > *Michael Jones:* Multiformats Considered Harmful [https://self-issued.info/?p=2408](https://self-issued.info/?p=2408). **Michael Jones:** To Dave's point, that's what people have already used, that people chose to not make a choice doesn't mean we should promulgate that or promote that. … I did a blog post on the issue, I think multibase is an interop mistake and we should get rid of it, if we chose a prefix, we're half getting rid of it, we'll see how this works out. **Dave Longley:** First to clarify, I didn't mean to imply that base64url encoding , the alphabet is what's being invented. We'd be inventing a new RDF type for a single base-encoding. Regarding failure to make a choice, depends on whether or not people feel things are equivalent. Community that built Multibase had their reasons. JWK allows different keys to be parameterized instead of saying one key format. We tried to re-use that work, and we've successfully been interoping with that technology. I would prefer us to not invent a new thing and cut it there because it seems like picking a prefix of U and putting it on there solves all the problems doesn't solve... we're building on other communities to greatest extend that we can profile those down, we should do that, instead of inventing something new. **Manu Sporny:** There are a number of misconceptions around what multibase is including in that post that Mike wrote. … I think we're talking about reinventing the wheel here, we're going to invent a single RDF type for a specific encoding of base64url-no-pad. … That is hyper-specializing the encoding format when we don't need to do that when we can just put a `u` on the front and say VCs use that and that addresses the "not making a choice" thing that Mike is concerned about. … Why would people choose different base encodings? There are legitimate reasons for doing so. Base32, for example, when encoding in a QR format, compresses way better than base64url. There are technical reasons for it. … You can't think that all encodings work for all use cases. When you get into embedded and compression use cases, those encodings matter. … There are totally valid reasons why people would want to use different base encodings and multibase understands that and that not every base encoding works everywhere. Base64 doesn't work in URLs. That's what Base64Url was created. … There are many ways to send things and the ecosystem matters, it's not a failure to decide. It's understanding that the world is bigger than just us in this group and utilizing a way that lets you signal what base encoding is used so we can just use the multibase format across all the layers instead of having to reinvent the stuff at every layer. … I appreciate you wrote that article but I think it's off base and it's not the reason multibase exists, it misses the point. **Brent Zundel:** It sounds like, maybe overly optimistic, it sounds like we're dancing around a middle ground that not everyone likes but we can live with, use base64url encoding, that's the choice we're making for multibase option for content integrity section. … feels like we're close there. > *Dave Longley:* the effect is larger than the invention of a tag itself. **Michael Jones:** Mainly going to respond to Dave's comment, defining base64url RDF type would be invention, perhaps, but it's not a very big invention, it's just tagging what this field uses. Creating the tag is perhaps an invention, but that's a good thing, it let's people know what choice was made. > *Ted Thibodeau Jr.:* I think a better title for that blog post would be "I Consider Multiformats Harmful" as it doesn't seem to incorporate anyone else's opinion. **Brent Zundel:** please chime in in the PR, let's explore that middle ground. ---

msporny commented 1 month ago

@selfissued wrote:

Please delete the lines adding the use of digestMultibase to the specification.

Removed in 869ebeb with issue #1489 raised to track the disagreement in this PR (and previous PRs related to digest mechanism).

msporny commented 1 month ago

Editorial, multiple reviews, changes requested and made, objections moved to issue #1489, no remaining objections for this PR, merging.

w3c / vc-data-model

Rework content integrity section based on @jyasskin's feedback. #1484

5.4 Integrity of Related Resources