w3cping / privacy-request

tracking privacy reviews of W3C specifications
9 stars 2 forks source link

BBS Cryptosuite v2023 2023-12-15 > 2024-01-15 #126

Closed Wind4Greg closed 3 months ago

Wind4Greg commented 8 months ago

Other comments: This is a cryptosuite for use within the Verifiable Credential Data Integrity framework.

kdenhartog commented 7 months ago

I'm planning to do a deeper review on this within the next week or two, but just wanted to add this while I'm briefly looking over it.

In section 5.2.5 it goes on to mention linkability concerns based on external information such as IP addresses/MAC addresses but doesn't mention that this can also occur just based on the claims themselves. For example, if a zip code, gender, and birthdate are supplied that's likely enough entropy within the claims that the subject could be de-anonymized further through the use of public datasets. This is generally true across all VC data models but should be specifically called out as a privacy concern for any selective disclosure based proof mechanism.

In this case, I'd highlight this within a separate section of 5.2 and cite https://dataprivacylab.org/projects/identifiability/paper1.pdf which I believe was the original paper that pointed this issue out.

Wind4Greg commented 7 months ago

Hi Kyle thanks for taking a look. In section 5.2.4 Linkage via Holder Selective Reveal I tried to address the linkage from the claims that a holder might reveal. I cited:

[NISTIR8053] NISTIR 8053: De-Identification of Personal Information. Simson L. Garfinkel. October 2015. URL: https://nvlpubs.nist.gov/nistpubs/ir/2015/NIST.IR.8053.pdf

[Powar2023] SoK: Managing risks of linkage attacks on data privacy. J. Powar; A. R. Beresford. Proceedings on Privacy Enhancing Technologies. 2023. URL: https://petsymposium.org/popets/2023/popets-2023-0043.php

Both of these cite the "Simple Demographics Often Identify People Uniquely" paper that you gave a link to. (Thanks for the link, I didn't have a copy!). I chose the Powar2023 reference since it was a "systematization of knowledge" of dealing with linkage attacks and cited/summarized 94 known public cases.

This discussion was put in this section since even if the "other layers" do their job right, what the holder chooses to reveal can provide strong linkage if care is not taken. We can move this discussion around if you think appropriate. I like citing more recent references, but could add another.

Cheers Greg B.

kdenhartog commented 7 months ago

Thanks for highlighting this - I completely missed this and should have completed my deeper review before commenting it looks like. When I finish up reading through this more I'll add comments on whether or not I think it's worth splitting into it's own section better or if it's fine the way it is. It's good to see it's at least called out even if I missed it the first time.

Wind4Greg commented 7 months ago

No worries @kdenhartog its a big chunk of text and there was no equivalent to emulate. I've been working with the IETF/DIF BBS folks on privacy considerations but when we get to using BBS we need to take in the "higher layers". Hence my layered discussion with information in the claims being the highest layer (and the thing most out of our control).

kdenhartog commented 7 months ago

Here's some unstructured notes that I'll leave here for now and come back to edit before the upcoming call

Many of these are confusing aspects that I had questions about or editorial comments. Some are security related, and the final two are privacy related and should be discussed further on the call. Will put more structure in tomorrow, but going to stop for tonight.

Wind4Greg commented 7 months ago

Hi @kdenhartog, some comments

kdenhartog commented 7 months ago

The current VC-DI-BBS specification, like the selective disclosure portion (SD primitives) of the ECDSA makes use of JSON pointers directly to specify both mandatory and selectively disclosed claims/statements. No JSON-LD framing is used in either document.

Thanks for clarifying this. I figured that out later on and forgot to remove it before I published my notes. This comment can be ignored and I think it's also worth saying using JSON Pointers seems like a simpler solution to JSON-LD frames because it avoids the concerns I was thinking about.

By user do you mean holder? The holder knows what the mandatory reveal statements are (included in the base proof) and then they get to choose the selective disclosure statements.

In this case I'm delineating between the two different entities that represent the holder - The holder software and the user of said software. The reason this matters is because in certain circumstances a conformant implementation may be required to over-reveal information by the issuer (e.g. say they required that credentialSubject.id is a mandatory pointer) or alternatively a non-compliant holder software implementation seems like it could drop mandatory properties without the users knowledge. Both of these seem like it would introduce a consent prompt issue where implementations should be displaying consent prompts at the UI layer.

I think those normative references to Multibase and Multicodec need to be removed. The Data Integrity spec now has needed definition and this spec has the key prefix.

I used values for base proof and derived proof headers similar to those (but with different values) from those in VC-DI-ECDSA. Hence we will need to get the details on those for you. @dlongley can you comment?

Yeah, it's odd because it seems to be redefining normatively (kind of normative statements aren't used in the algorithms, but it's assumed that's the intention) what these specs already define. For example, prefixing b64url encoded data with a u is a trick from Multibase's playbook. I presume this may be a work around to the fact that those haven't been normatively published yet. Personally it made it a bit harder to read and I was only able to deduce this from prior knowledge working on this stuff, but I don't think other readers would have that privilege.

Good catch JSON Pointers needs to be referenced (also in ECDSA-SD). Mandatory reveal claims are specified with JSON Pointers hence the name mandatoryPointers, selectively disclosed claims are specified with JSON Pointers in the selectivePointers array.

The confusing part here was not only that these weren't normatively referenced, but also that the recommended variable name doesn't point out that these are what are being used. It currently leaves it up to the reader to infer this based on the name of the variables and mention of JSON Pointers in an example at the bottom.

Sections 4.1 and 4.2 reference and reiterate some of the security considerations from the IETF/CFRG spec. These have matured with the v5 release of the BBS draft.

I passed through the nonce and challenge values from the generic data integrity proof specified in the Data Integrity specification. Do you think these should be disallowed completely? We put in privacy concerns on their use in section w3c.github.io/vc-di-bbs/#linkage-via-proof-options-and-mandatory-reveal.

Part of the issue here is that the BBS draft is written under the assumption that the randomness here will likely be generated in a non-interactive way using a fiat-shamir heurstic. This creates a bit of an issue because the purpose of the challenge property is to leverage the interactive technique. However, what's not considered here is that in the interactive case the holder needs to validate that the challenge property isn't reused such as by an adversarial verifier. I believe the security proofs collapse if this happens and according to the spec would lead to the following:

In any case, the randomness used in ProofGen MUST be unique in each call and MUST have a distribution that is indistinguishable from uniform. If the random scalars are re-used, created from "bad randomness" (for example with a known relationship to each other) or are in any way predictable, the undisclosed messages or the signature value may be compromised.

A safer way to handle this seems to be to add the challenge as an additional message and to leverage the non-interactive randomness generation technique. I've not thought too deeply about if this breaks the validation of the proof generated, but there's got to be a safer way to do this by normatively requiring usage of the non-interactive method instead.

dlongley commented 7 months ago

@kdenhartog,

Is there significance to the header bytes in section 3.2.1? 0xd9, 0x5d, and 0x02. These seem like they should be described with greater detail.

Yes, the 0xd9 is a CBOR value that indicates that the following value is a 16-bit tag identifying the content (e.g., tag 0x5d02 in your comment). We will be adding an "IANA considerations" section to each relevant spec and that describes the CBOR registration and will be using those for the CBOR registrations. We will be adding registrations that cover these 16-bit tag values:

0x5d00 (23808) ecdsa-sd-2023 base proof 0x5d01 (23809) ecdsa-sd-2023 derived proof 0x5d02 (23810) bbs-2023 base proof 0x5d03 (23811) bbs-2023 derived proof

dlongley commented 7 months ago

@kdenhartog,

Part of the issue here is that the BBS draft is written under the assumption that the randomness here will likely be generated in a non-interactive way using a fiat-shamir heurstic. This creates a bit of an issue because the purpose of the challenge property is to leverage the interactive technique. However, what's not considered here is that in the interactive case the holder needs to validate that the challenge property isn't reused such as by an adversarial verifier. I believe the security proofs collapse if this happens and according to the spec would lead to the following:

In any case, the randomness used in ProofGen MUST be unique in each call and MUST have a distribution that is indistinguishable from uniform. If the random scalars are re-used, created from "bad randomness" (for example with a known relationship to each other) or are in any way predictable, the undisclosed messages or the signature value may be compromised.

A safer way to handle this seems to be to add the challenge as an additional message and to leverage the non-interactive randomness generation technique. I've not thought too deeply about if this breaks the validation of the proof generated, but there's got to be a safer way to do this by normatively requiring usage of the non-interactive method instead.

I was having trouble following the above -- but I think I may have sorted out why, which is that there seems to be some confusion that the challenge proof property mentioned in the base DI spec is the same as / some how related to "challenge value" that the BBS spec refers to because they happen to share the same name "challenge". But these are not the same and BBS just happens to be a cryptosuite that has primitives that internally reuse the name "challenge" (which has been used in DI authentication proofs long before BBS).

In short, the implementation and use of the BBS ProofGen primitive (specifically the pseudo-randomness generated within it) is not at all affected by the challenge property from the base DI spec. The "challenge value" generated via BBS's ProofChallengeCalculate (called internally from CoreProofGen which is called from ProofGen) will always be calculated by an implementation in control of the prover (holder) and therefore always uses pseudo-randomness generated by the prover's (holder's) software/hardware. The layers are quite different.

Do we need a statement indicating that the challenge property from the base DI spec is not / does not refer to the "challenge value" that is generated and used internally by the BBS primitives to eliminate the potential for confusion?

plehegar commented 5 months ago

[[ We agree that it would be useful to have unlinkable credentials. We don't have an opinion at the moment about whether this should be a requirement, but it also shouldn't be excluded as a possibility. We think this issue should be discussed further in PING and we look forward to reviewing the outcome of their discussion. ]] https://github.com/w3ctag/design-reviews/issues/922

pes10k commented 5 months ago

@kdenhartog @dlongley it looks like the review triggered a lot of useful conversation. Now that the review is complete though, would it be possible to move the conversation into relevant issues filed against the spec?