w3c-ccg / vc-status-rl-2020

VC Revocation List 2020
https://w3c-ccg.github.io/vc-status-rl-2020/
Other
9 stars 3 forks source link

Privacy Preserving Extensions #6

Open OR13 opened 3 years ago

OR13 commented 3 years ago

@csuwildcat proposed some privacy preserving alterations here: https://github.com/csuwildcat/hashfield

Wondering if there is any appetite to add support for them so we don't see a fork of the spec?

OR13 commented 3 years ago

ping @msporny @dlongley @kdenhartog @tplooker

csuwildcat commented 3 years ago

@OR13 have no issue working collaboratively on a spec for this, but there are enough significant modifications/additions that it may warrant distinguishing this construction. I don't think there exists the ability to have backwards compatibility with creds that use the current RL 2020 scheme.

OR13 commented 3 years ago

@csuwildcat might we consider calling your schema RevocationList2021 and noting its features were designed to plug privacy issues associated with earlier scheme?

Breaking changes don't require the creation of a new spec or software library, we have versioning, we can use it :)

OR13 commented 3 years ago

ping @csuwildcat @tplooker

csuwildcat commented 3 years ago

The one general think I dislike about the current spec is the name: it should be called Status List, because literally nothing about it is bound to the bit field indicating only revocation. You can publish different fields for different 'topics', wherein revocation is simply one topic Issuers may choose to express about a credential.

OR13 commented 3 years ago

@csuwildcat I had not thought about that... very interesting idea to generalize it to topics.

csuwildcat commented 3 years ago

@OR13 @msporny I no longer believe the dynamic repositioning of the indexes provides sufficient value for the complexity it introduces, so assuming we can add the following as options/additions to this existing spec, can we all just work on this one?:

  1. Advise in the spec as to how to use the construction with a larger bitfield from the start to prevent against lengthening observation (if your use case needs more protection against it)
  2. Specify how/why one would chaff the unused positions in the field, if you wanted to further prevent any aggregate observations of the flow of status activity in the field.
  3. Can we change the name to Status List, given this could be used to express any status 'topic'?
OR13 commented 3 years ago

Seems like reasonable suggestions to me.

OR13 commented 3 years ago

I think the first 2 are relatively straight forward:

Advise in the spec as to how to use the construction with a larger bitfield from the start to prevent against lengthening observation (if your use case needs more protection against it)

Pre-initialize a the space? Perhapse a simple example of why this is a problem in the privacy section, and some mitigation language would be sufficient.

Specify how/why one would chaff the unused positions in the field, if you wanted to further prevent any aggregate observations of the flow of status activity in the field.

This belongs in the privacy section as well, including comments on trading storage for privacy. I think an algorithm filling the field could easily be provided, in an extension / appendix.

Can we change the name to Status List, given this could be used to express any status 'topic'?

This one would require the most work IMO, we would want to see some real use cases for the topics, and they may raise additional privacy concerns. Its essentially just a bunch of writing.

@csuwildcat want to take a stab at a PR to address the first 2?

csuwildcat commented 3 years ago

I'll do so, but would rather the chaff and random distribution options be a list of optional steps one could perform within the appropriate functional phases.

On Wed, Nov 18, 2020, 6:45 AM Orie Steele notifications@github.com wrote:

I think the first 2 are relatively straight forward:

Advise in the spec as to how to use the construction with a larger bitfield from the start to prevent against lengthening observation (if your use case needs more protection against it)

Pre-initialize a the space? Perhapse a simple example of why this is a problem in the privacy section, and some mitigation language would be sufficient.

Specify how/why one would chaff the unused positions in the field, if you wanted to further prevent any aggregate observations of the flow of status activity in the field.

This belongs in the privacy section as well, including comments on trading storage for privacy. I think an algorithm filling the field could easily be provided, in an extension / appendix.

Can we change the name to Status List, given this could be used to express any status 'topic'?

This one would require the most work IMO, we would want to see some real use cases for the topics, and they may raise additional privacy concerns. Its essentially just a bunch of writing.

@csuwildcat https://github.com/csuwildcat want to take a stab at a PR to address the first 2?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c-ccg/vc-status-rl-2020/issues/6#issuecomment-729726132, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABAFSX2HKLK6VCRKZ5XO53SQPMZVANCNFSM4TXHHVLQ .

OR13 commented 3 years ago

@csuwildcat they could be part of a 'setup phase' which might include planning for the block size, etc... I think we can probably structure the spec to support that.

dlongley commented 3 years ago

Note: I get a 404 when I try to visit: https://github.com/csuwildcat/hashfield

dlongley commented 3 years ago

@csuwildcat,

The one general think I dislike about the current spec is the name: it should be called Status List, because literally nothing about it is bound to the bit field indicating only revocation. You can publish different fields for different 'topics', wherein revocation is simply one topic Issuers may choose to express about a credential.

+1 -- We've already been reusing the bit fields like this internally anyway, so I agree.

kimdhamilton commented 3 years ago

@csuwildcat can you let us see https://github.com/csuwildcat/hashfield? is it a private repo? I get a 404

csuwildcat commented 3 years ago

@kimdhamilton I got rid of it after I figured out it would require like 10x+ the rounds of hashing than I first thought, which might require 1s of CPU time, and not worth the nascent benefit if we just add the optional random selection and chaffing processes to this more simple/straightforward construction.

msporny commented 3 years ago

@csuwildcat wrote:

Advise in the spec as to how to use the construction with a larger bitfield from the start to prevent against lengthening observation (if your use case needs more protection against it)

Agreed, we might just want to introduce a default algorithm and set the initial lengthening ratio, setting the value to 1.

I'll note that it's important how the chaffing values are inserted and managed, because you can leak information on the chaffing values based on when the bits are flipped. There is information leakage there that we'll have to be careful about.

Specify how/why one would chaff the unused positions in the field, if you wanted to further prevent any aggregate observations of the flow of status activity in the field.

Yep, agreed.

Can we change the name to Status List, given this could be used to express any status 'topic'?

Yes, we should probably make that change... it's really just a status list... however, we may want to have types to define the sort of status. For example, "revocation" is one type of status list... but "active" might be another. That is, some credentials could be issued, but their activation may go on and off based on some schedule... think of a VC that can only be used during business hours in the EU... that's a use case that's not supported by the issuance, expiration, or revocation information. Food for thought.

ntn-x2 commented 3 years ago

What about tracking by the verifier? If I present the same credential more than once, my credential will probably have the same index in all presentations, meaning that the verifier will know it is the same entity across all the interactions.

msporny commented 3 years ago

What about tracking by the verifier? If I present the same credential more than once, my credential will probably have the same index in all presentations, meaning that the verifier will know it is the same entity across all the interactions.

Yes, this is a concern. It requires collusion among multiple verifiers, and a better tracking mechanism would be to just use the digital signature (for non-pseudonymous digital signature schemes). The goal with this scheme is to prevent issuer-based tracking. Verifiers can still use any unique identifier to track you if they so desire... in those cases, you need to ensure that the entire presentation and all VCs in that presentation provide enough randomness or herd immunity to prevent that sort of tracking (which is a very difficult problem and one that has, arguably, not been solved yet). There is some work in BBS+ that might be applicable here.

ntn-x2 commented 3 years ago

@msporny thanks for your answer! I would like to make the point that this type of tracking does not necessarily need multiple verifiers to collide. For instance, if I am buying snacks from the same vending machine every day, the vending machine knows that it's always me, as it can correlate the same index used for proof-of-non-revocation. Then, of course, a collusion with multiple identifiers would be even worse, but at least correlation should not be so easy in the single-verifier case at least. As of today, the only truly privacy-preserving solution to credential revocation are cryptographic accumulators, even though they have other downsides, like constantly updating the delta etc. So I was just curious to know why this problem has not been considered in the analysis in this thread, as I see issuer-based attack and verifier-based attack equally bad.

msporny commented 3 years ago

So I was just curious to know why this problem has not been considered in the analysis in this thread,

It's not that this problem hasn't been considered before... it's been considered for decades and complex cryptographic and other security schemes have been devised to combat the attack vector you describe. It is possible to get to a situation where you're pseudonymous, but then a verifier asks an individual for payment, or an email address and their privacy is blown out of the water.

This is one of the reasons that the Verifiable Credentials specification doesn't assert a position on "one true revocation/status mechanism"... there are a variety of ways to address the issue and each mechanism has benefits and drawbacks. When looking at a broad set of use cases, there is no consensus wrt. the proper revocation mechanism, which tracking risk is more dangerous than the other, or what solution would work in all situations.

What this specification does is provide ONE simple solution that people concerned about issuer tracking (like governments that have strong privacy regulations) can use and compel their vendors to use. It can't be all things to all use cases. Hopefully a technology will come along that is both simple and applicable to a broader range of use cases, and the VC spec purposefully leaves the door open for that to happen.

dlongley commented 3 years ago

@Diiaablo95,

The ability for a user to commit fraud increases considerably if the user gets to decide whether the verifier knows if one of their credentials is being reused.

Checking for reuse needs to be handled by a witness the verifier trusts, even if the verifier doesn't get to know which credential was reused. A verifier should be able to know (and trust) whether a credential was used in a previous interaction yet the user is declaring a different identity. This enables the verifier to decide if that is acceptable for their use case; sometimes it will be, other times not.

Regardless, this spec isn't designed to directly handle that case. It could be used in a layering fashion, however, to address the particular problem you highlight. For example, the aforementioned trusted witness could perform status checks at the same time that they are checking for reuse and then both pieces of information could be forwarded onto the verifier as new credential(s) asserted by the witness.

kimdhamilton commented 3 years ago

+1 to @csuwildcat's request, and to avoid making him cry inside.

Questions:

I have permissions to update repo-level things if needed.

csuwildcat commented 3 years ago

Oh, and I did think of a way to do the extra privacy stuff without any difficult rounds of hashing, in just a single pass that doesn't make the resulting encoded string much larger, if we want to discuss that at some point.

OR13 commented 3 years ago

Can we get a clear proposal for what changes need to be made and where they need to be made?

Here is my attempt:

csuwildcat commented 3 years ago

The change set would be:

  1. Change the name so that it's just Status Lists in general
  2. Let the type field reflect the new general name of the spec
  3. Change the revocationListIndex field to be statusListIndex
  4. Change the revocationListCredential field to be statusListCredential
  5. Change the value description of the to reflect a new, more generic credential type: StatusList2020Credential
  6. Add a field named topic, and let the definition state that the value is to be a string that describes the topic of the list (e.g. revocation)
  7. Add the topic field to the StatusList2020Credential, so it is present there too.

I think that's about it, right?

msporny commented 3 years ago

Alternate, but highly aligned, proposal here: https://github.com/w3c-ccg/vc-http-api/pull/92#discussion_r566857048

I will note that there is a large cohort of organizations implementing this specification now for an interop fest in March. I doubt any of them would be happy with the changes being made right now. We'll want to get their input here: @tplooker @OR13 @peacekeeper @mavarley

OR13 commented 3 years ago

I would be happy to pin the version for interop, and still fix the issue.

msporny commented 3 years ago

I would be happy to pin the version for interop, and still fix the issue.

I suggest we keep this spec as-is, mark it as deprecated (at the top of the spec), and do a new 2021 specification. The current interop cohort would only be expected to implement the old version.

OR13 commented 3 years ago

I suggest we keep this spec as-is, mark it as deprecated (at the top of the spec), and do a new 2021 specification. The current interop cohort would only be expected to implement the old version.

that works, @msporny are you ok with forking the spec and implementing the changes proposed for 2021? I am happy to do that work.

csuwildcat commented 3 years ago

Yeah, just doing a new one for the new year-revision would work too, right @msporny?

tplooker commented 3 years ago

Agree that in general the mechanism for status expression that DB have defined here can be generalized beyond just a binary expression of a particular type of status (e.g revocation), however if we generalize at that layer, how do we communicate the intent of the credential status?

Take for instance

{
  "@context": [
    "https://www.w3.org/2018/credentials/v1",
    "https://w3id.org/vc-revocation-list-2020/v1"
  ],
  "id": "https://example.com/credentials/23894672394",
  "type": ["VerifiableCredential"],
  "issuer": "did:example:12345",
  "issued": "2020-04-05T14:27:42Z",
  "credentialStatus": {
    "id": "https://dmv.example.gov/credentials/status/3#94567",
    "type": "StatusList2020Status",
    "listIndex": "94567",
    "listCredential": "https://example.com/credentials/status/3"
  },
  "credentialSubject": {
    "id": "did:example:6789",
    "type": "Person"
  },
  "proof": { ... }
}

what is the credential status that I would resolve from this credential representing? i.e if I get a 1 back for this credential is that expressing revoked, non-revoked, active in-active, suspended? The semantics of the list must be captured somewhere so that the verifier can understand the statuses intent

will note that there is a large cohort of organizations implementing this specification now for an interop fest in March. I doubt any of them would be happy with the changes being made right now. We'll want to get their input here: @tplooker @OR13 @peacekeeper @mavarley

Yeah -1 to a breaking change that will effect existing implementations, I'm not against the generalization being added as a new revised suite.

msporny commented 3 years ago

that works, @msporny are you ok with forking the spec and implementing the changes proposed for 2021?

Great... done.

https://w3c-ccg.github.io/vc-status-list-2021/

Everyone, please review the PR:

https://github.com/w3c-ccg/vc-status-list-2021/pull/1

@csuwildcat -- it would be really great if DIF could move at the speed that the W3C CCG moves on these things (43 minutes from your request to a fully formed specification). When do you guys think you're going to be able to move at that speed? :P

tplooker commented 3 years ago

Also if we are generalizing is it appropriate to consider beyond just binary status representation, for example could a credential occupy more than 1 bit in a status list therefore giving you an enumeration greater than two states for more advanced status expressions?

tplooker commented 3 years ago

A concrete example is we have had requests for a tri-state credential status expression where by a credential needs to be active, in-active or suspended. Therefore if each credential occupied two bits you would have 4 expressible states for the credential

kimdhamilton commented 3 years ago

@jchartrand, see @tplooker's comment about using this approach beyond binary states, which we were discussing.

OR13 commented 3 years ago

I would propose NOT tackling states beyond binary in status-2021, and instead making YET ANOTHER spec for that...

could state-list-2021 etc... status should remain binary.

msporny commented 3 years ago

I would propose NOT tackling states beyond binary in status-2021

I agree with @OR13 -- we don't want the spec to turn into a swiss army knife.

That said, we might be able to accomplish this with a "bitWidth" entry that defaults to '1', but could be any arbitrary number... 3, 4, 8, etc. The only thing that really changes is the calculation of where you want to look in the bit string. Run length compression might suffer with a bunch of 01001101s, but it would probably just be a function of the bit width... you'd just have to make sure your default would result in long binary strings of 0s, 1s, 01s, or 10s... something that repeats at a regular basis.

Once challenge with non-binary states is that you then have to know what each state means (because the verifier needs to know)... and then you probably have to communicate yet another mapping of bitstring to logical state. Seems tenuous.

Alternatively, you could just use another status list... you want multiple states? Use two different status lists... there's nothing preventing you from doing this:

  "credentialStatus": [{
    "id": "https://dmv.example.gov/credentials/status/3#94567",
    "type": "SuspensionList2020",
    "statusListIndex": "94567",
    "statusListCredential": "https://example.com/credentials/status/3"
  }, {
    "id": "https://dmv.example.gov/credentials/status/3#94567",
    "type": "RevocationList2020",
    "statusListIndex": "94567",
    "statusListCredential": "https://example.com/credentials/status/4"
  }],
msporny commented 3 years ago

@tplooker wrote:

how do we communicate the intent of the credential status?

Yep, that's a problem... we'll have to expose the type of status list it is in the VC... not a big issue, we just need to create a few new types.

@OR13, sounds like a great use case for a registry -- I hear you love and are expert at creating and maintaining those things. We could have a whole registry for credential status types, and a governance process around it, and a council of elders to weigh in on things like copyright violations and moral dilemmas created by the types of lists the registry maintains. :P

I was joking above... until I realized that we already have a VC Extension Registry, and that the types of status lists we're talking about are going to have to end up there. :((((

tplooker commented 3 years ago

Yep, that's a problem... we'll have to expose the type of status list it is in the VC... not a big issue, we just need to create a few new types.

Ok great, we are aligned on this and just to be clear if we do extend it so the bit width can be greater than 1 than so long as these semantic expressions extend to define the mapping of what the possible states mean, then I think we have a workable solution.

csuwildcat commented 3 years ago

that works, @msporny are you ok with forking the spec and implementing the changes proposed for 2021?

Great... done.

https://w3c-ccg.github.io/vc-status-list-2021/

Everyone, please review the PR:

w3c-ccg/vc-status-list-2021#1

@csuwildcat -- it would be really great if DIF could move at the speed that the W3C CCG moves on these things (43 minutes from your request to a fully formed specification). When do you guys think you're going to be able to move at that speed? :P

People literally asked me not to just go do this in an hour months ago when I asked, so careful what you wish for ;)

peacekeeper commented 3 years ago

+1 to this. I like both suggested approaches (multiple state lists vs. single list with multiple bits per state).

I guess one difference is that with multiple lists you have to answer the question what to do if multiple states are set (e.g. "revoked" AND "suspended" are set to 1), whereas with multiple bits per state you can give each combination its own meaning (e.g. 00=active, 01=suspended, 10=revoked, 11=disputed, etc.)