w3cping / privacy-request

tracking privacy reviews of W3C specifications
9 stars 2 forks source link

Bitstring Status List v1.0 2024-01-21 > 2024-02-13 #127

Closed msporny closed 4 months ago

msporny commented 7 months ago

Other comments:

Our apologies, we thought we had submitted this review when we submitted the TAG review request, and when we did the security and privacy self-review, but we failed to do so.

We were hoping to enter CR next month, but are gated by this review (the WG believes it is done enough to enter CR).

kdenhartog commented 7 months ago

I'll assist on this one with @npdoty

msporny commented 7 months ago

@kdenhartog wrote:

I'll assist on this one with @npdoty

Great, thank you @kdenhartog ! :)

npdoty commented 7 months ago

PING discussed this on our February 1 call.

Summary of issues: (@kdenhartog, @jyasskin and others can review to see if I've got the right content here. VCWG, I can make separate issues with these once someone has reviewed my notes, but if you want to look at them ASAP, feel free and just recognize that this is one reviewer's imperfect notes.)

The status list index appears to be a cross-origin persistent unique identifier. The same status list index provided on a credential used on two different websites could let those websites link the user between them. This will be abused as a very effective cross-context tracking vector.

An issuer could provide unique status list URLs per credential, in order to effectively track when a holder's credential is used at each verifier. (This threat is already noted in the privacy considerations.) We should consider mitigations for this threat, including consistency checks. This draft on key consistency and discovery may be a useful starting point.

Caching and CDNs are suggested as potential mitigations for hiding when a holder visited a verifier. Caching recommendations to verifiers could use normative guidance. It's not clear that CDNs provide a meaningful privacy protection; rather, anonymizing proxies could provide a more explicit design for that purpose. Oblivious HTTP might be a better fit.

Checking for revocation or status updates in general also reveal information about the holder to the verifier, potentially indefinitely afterwards. This is a significant risk, especially for open-ended status messages, or where status could reveal something sensitive (revocation of a privilege,

Larger issues:

Revocation status is often not important to the use case for credentials. For example, my driver's license may have expired or my driving privileges may have been revoked, but my age won't have changed in either case. It seems guidance is necessary for when status list information should be included at all in the response to a credential request, and when it isn't appropriate. Or when it is appropriate, the spec needs to highlight the privacy issues in doing so, and verifiers and holder software will need to communicate that to the user.

There are alternative designs for revocation that may have different privacy properties, and might be preferable. Considering these would be a longer work item, but, for example, a stapled validity proof would be a way for a holder to prove that a credential is recently valid. Short-lived credentials could provide assurance to the verifier when necessary -- requiring the holder to more frequently interact with the issuer, but under the holder's control and without involving the particular verifier. Cryptographic accumulators could be used to efficiently communicate revocation/non-revocation status without requiring the verifier to contact the issuer.

https://ieeexplore.ieee.org/document/9376112 https://medium.com/@alexeysamoshkin/how-ssl-certificate-revocation-is-broken-in-practice-af3b63b9cb3

Smaller issues:

Could dummy entries in the status list help to protect against an attacker who just regularly scans the list to try to reidentify users or their status?

The spec requires that the status list index SHOULD be randomized. I can see that there may be alternatives, but there could be a MUST requirement for the unpredictability/intelligence-free nature of the index: it must not be something that a recipient could infer, or where the index reveals something about the credential (like its recency, etc.).

martinthomson commented 6 months ago

In addition to the comments from @npdoty, I will note that this design depends on indices of issued credentials being sequential. That has secondary concerns from the perspective of privacy. If I can observe when someone is issued a credential, I can use this information to determine whether their credential is revoked. Though this concern is just a variant of the "persistent unique identifier" concern above, the point is that the design depends on the value for other people being highly predictable.

On the design front, did the working group consider designs like CRLite? If the universe of possible values is known, then a probabilistic data structure can be used to compress even more. And - though I haven't verified this - the privacy issues that are expressed above might not apply in that context.

(Also, from a pure information theory perspective, zlib might be as effective as a far simpler run-length encoding, but I'm not sure that it would be in practice if you have decent coding for the lengths.)

msporny commented 6 months ago

@npdoty wrote:

Summary of issues: (@kdenhartog, @jyasskin and others can review to see if I've got the right content here. VCWG, I can make separate issues with these once someone has reviewed my notes, but if you want to look at them ASAP, feel free and just recognize that this is one reviewer's imperfect notes.)

Great, thanks for the summary, @npdoty, much appreciated. The response below is just my response as an Editor of the specification, it's not an official WG response. We'll handle the official WG response to issues raised by PING via the issue tracker.

The status list index appears to be a cross-origin persistent unique identifier. The same status list index provided on a credential used on two different websites could let those websites link the user between them. This will be abused as a very effective cross-context tracking vector.

Yes, that is true; only a solution that is built on newer (potentially non-NIST approved) cryptography would be able to provide a proof of non-revocation without correlation. The WG hopes that it will be re-chartered to put that work in scope with the understanding that the use of ZKPs to provide unlinkable revocation status is still an experimental field with questionable applicability to use cases mandating NIST-approved cryptography. One of our restrictions in coming up with a solution was to use traditional cryptography that is approved by national standards setting bodies such as NIST and ETSI.

Given that traditional cryptography is required in many use cases that require a revocation list, the digital signature on a VC is a more useful correlation vector than the status list index (the signature is more generally available). As such, we expect the Bitstring Status List approach is no more correlating than using a traditional digital signature on a VC.

That is not a counter-argument to the correlatability concern raised by PING, the VCWG has the same concern; it's merely elaborating on the discussions the CCG and VCWG has had around the types of correlating information that exist in a VC (or any other modern digital credential format, such as ISO mDL).

One of the main design goals was the elimination in "phone home" to the issuer in order to prevent issuer tracking/correlation. Without the use of more advanced (and non-NIST/ETSI-approved) cryptography, we cannot eliminate the correlative effects of a status list among verifiers. There was a presentation to the CCG on this topic today; see slides 5-7 in the attached slide deck here:

https://lists.w3.org/Archives/Public/public-credentials/2024Feb/0034.html

An issuer could provide unique status list URLs per credential, in order to effectively track when a holder's credential is used at each verifier. (This threat is already noted in the privacy considerations.) We should consider mitigations for this threat, including consistency checks. This draft on key consistency and discovery may be a useful starting point.

Yes, the concept of ensuring that issuers are not abusing the status list to track individuals has been a topic of discussion in the group, and as you mentioned is highlighted in the privacy considerations section today:

https://www.w3.org/TR/vc-bitstring-status-list/#malicious-issuers-and-verifiers

It has been proposed that digital wallet providers track the "uniqueness" of status lists and flag status lists that seem unique to the holders that might be effected.

Seeing a message that has something to this effect might be useful: "It seems like your Electrician's license contains a tracker, when you show it to someone, Electricians Incorporated will be notified that you're showing it. Find out more by reading about how we try to protect your privacy."

That said, that is beyond the purpose of the specification -- it's way up the stack in application space. We might want to speak more directly to the concern in the privacy considerations section (perhaps in the section linked to above), because it's not obvious that we have thought about this and how to potentially mitigate the worst of it (though implementing that at scale continues to be a thought exercise).

Caching and CDNs are suggested as potential mitigations for hiding when a holder visited a verifier. Caching recommendations to verifiers could use normative guidance. It's not clear that CDNs provide a meaningful privacy protection; rather, anonymizing proxies could provide a more explicit design for that purpose. Oblivious HTTP might be a better fit.

We do mention OHTTP here:

https://www.w3.org/TR/vc-bitstring-status-list/#malicious-issuers-and-verifiers

... but perhaps we should move that language out into the Algorithms section so it's easier to find?

Checking for revocation or status updates in general also reveal information about the holder to the verifier, potentially indefinitely afterwards. This is a significant risk, especially for open-ended status messages, or where status could reveal something sensitive (revocation of a privilege,

Yes, agreed. One mechanism to combat this that the group discussed was the shortening of timeframes for a particular VC, but then there is an argument around ensuring that the holder has regular contact w/ the issuer, which is not the case in physical credential use cases today (such as driver's licenses). Another mitigation discussed was providing the verifier with a "consent token" that only allows them to check the status if they are authorized (for a limited timeframe), but that creates a privacy harm that reduces the group privacy characteristics that you get with a large list (you'd have to make the list smaller for the "consent token" to be used appropriately. Yet another approach was to require the verifier to phone home and ask about a specific subject, but then we're back to phoning home, which is also harmful to privacy.

At present, we think the right balance is what we have (due to the privacy drawbacks outlined above and the need to meet regulatory burden around revocation/suspension in a variety of use cases).

Revocation status is often not important to the use case for credentials. For example, my driver's license may have expired or my driving privileges may have been revoked, but my age won't have changed in either case. It seems guidance is necessary for when status list information should be included at all in the response to a credential request, and when it isn't appropriate. Or when it is appropriate, the spec needs to highlight the privacy issues in doing so, and verifiers and holder software will need to communicate that to the user.

Yes, agreed. The specification would benefit from a section that speaks to when status information is appropriate and when it isn't.

There are alternative designs for revocation that may have different privacy properties, and might be preferable.

Yes, agreed, and the WG desires to be re-chartered to work on more privacy-preserving status list mechanisms. One such mechanism (ALLOSAUR) is contemplated towards the end of the slide deck linked to above.

Considering these would be a longer work item, but, for example, a stapled validity proof would be a way for a holder to prove that a credential is recently valid.

The current specification does support "stapled validity proofs". The holder can deliver the VC and the associated status list at the same time without the verifier needing to retrieve the list from the issuer at all. We should highlight this more in the specification as it probably is not obvious that this is a possible mode of operation.

The challenge with this mode of operation is that the wallet ecosystem and the verifier ecosystem have to support it. There are use cases where one does not have the protocol bandwidth to deliver the stapled revocation list (e.g., delivering the VC over QR Code, which requires the payload to stay under 400 bytes in most use cases). The verifier could also not accept stapled revocation lists because of internal business logic that states to always fetch the revocation list from the URL provided in the VC. It is not clear if the ecosystem will support revocation list stapling, but to be clear, the specification was explicitly designed to support it (this is why the revocation lists are VCs that are signed).

Short-lived credentials could provide assurance to the verifier when necessary -- requiring the holder to more frequently interact with the issuer, but under the holder's control and without involving the particular verifier.

Yes, the group has contemplated short-lived VCs as well and has picked that solution (instead of depending on status lists) in a variety of production deployments today. The drawback, of course, is that the holder needs frequent interaction with the issuer to refresh, and if the issuer infrastructure goes down for an extended period, then the holder loses the validity on the credential completely.

Think of national and state ID card use cases. It could be considered a privacy harm to require an individual to repeatedly get an updated national or state ID card every day, week, or month.

There are trade-offs here and perhaps the specification should highlight these trade offs a bit more in the privacy considerations section.

Cryptographic accumulators could be used to efficiently communicate revocation/non-revocation status without requiring the verifier to contact the issuer.

We have contemplated the use of cryptographic accumulators and note that the current versions require frequent re-distribution of the proof (every time a credential is revoked in a population of N, you have to re-distribute the accumulator to the entire population of N, leading to an N^2 network load for the accumulator -- which is non-trivial amount of network traffic). There are other mechanisms that might not have this property, but I wanted to convey that we have studied and contemplated the use of accumulators and have found that there are significant drawbacks when you have large populations.

Could dummy entries in the status list help to protect against an attacker who just regularly scans the list to try to reidentify users or their status?

Yes, and it would be worth saying something about that in the specification. We went into what one can do to avoid statistical attacks against group privacy during the presentation on bitstring status list yesterday:

https://meet.w3c-ccg.org/archives/w3c-ccg-weekly-2024-02-20.mp4

We should speak to decoy values and how one can avoid statistical attacks on the list by not only using decoy values, but flipping those bits randomly (when you flip other bits in the list).

The spec requires that the status list index SHOULD be randomized. I can see that there may be alternatives, but there could be a MUST requirement for the unpredictability/intelligence-free nature of the index: it must not be something that a recipient could infer, or where the index reveals something about the credential (like its recency, etc.).

We went with a SHOULD because there are some populations where randomization might not matter. For example, corporations in a particular locality -- it's public information, you know exactly how big the set size should be, and whether or not a certain license for a corporation is revoked is often a matter of public knowledge.

The argument was made that we should just ratchet up the privacy characteristics with a MUST, but the implementers (at the time) felt like that was too overbearing. Implementers are either going to do a good job with ensuring their allocations happen in a privacy preserving manner, or they're not, and a MUST would require us to write a conformance test, which would require us to issue a VC for every item in the list and then determine if the allocation was "random enough". As you can imagine, testing this would take a long time across the 15+ issuance implementations we have today (and the implementers complained about eating that much compute/network time just to prove conformance with a W3C test suite).

In short, it's not a MUST today, because if we make it a MUST, we have to test it, and the only way to really test this is to exhaust a statistically significant portion of a status list, which would create a burden on implementers during testing that they were not willing to bear.

@martinthomson wrote:

In addition to the comments from @npdoty, I will note that this design depends on indices of issued credentials being sequential. That has secondary concerns from the perspective of privacy. If I can observe when someone is issued a credential, I can use this information to determine whether their credential is revoked. Though this concern is just a variant of the "persistent unique identifier" concern above, the point is that the design depends on the value for other people being highly predictable.

Hmm, no, we say that the indices SHOULD be randomly allocated. Sequential allocation does create privacy problems, which is why we strongly suggest against it. Note the commentary above, where we probably want to say more about this, but the notion that we're suggesting indices should be linear is not correct.

On the design front, did the working group consider designs like CRLite? If the universe of possible values is known, then a probabilistic data structure can be used to compress even more. And - though I haven't verified this - the privacy issues that are expressed above might not apply in that context.

Yes, we looked at CRLite (specifically) and using Bloom filters w/ updates (generally). We couldn't use CRLite directly because it was a bit of a weird map onto VCs due to its focus on x509 certificates and how they work, that said, the general approach (which I believe is what you're really speaking to) was analyzed and not picked due to the following reasons:

IOW, Bloom filters were an optimization that added complexity w/o much benefit from a storage or privacy standpoint. That said, we do expect other types of status lists to be created in the future with better privacy characteristics through the potential use of accumulators fixed to smaller population sizes, bloom filters into hash lists of post-quantum signatures (where you do get a storage benefit), and other storage/privacy optimizations that we aren't able to standardize today.

(Also, from a pure information theory perspective, zlib might be as effective as a far simpler run-length encoding, but I'm not sure that it would be in practice if you have decent coding for the lengths.)

The specification requires using GZIP to compress the stream, which uses LZ77 (which ZLIB also uses). We started with ZLIB and found out that a vocal group of implementers wanted to use GZIP libraries instead "because it was more broadly supported!" (which is debatable). The debate went nowhere and consensus was to just use GZIP since no one objected to that (and because there were objections to using ZLIB). Also, the "GZIP has a checksum, zlib doesn't!" gave a slight advantage to GZIP over ZLIB among those debating.

We didn't want to use simpler (bespoke) RLE algorithms because implementers were afraid that other implementers would get the implementation wrong. This was further exacerbated by the addition of a statusPurpose of "message", which uses a packed bitstring to convey multiple status types. The repetitive nature of the packed bitstring would cause simple RLE compression to fail where the more sophisticated ZLIB/GZIP compression would probably do the right thing given the LZ77 sliding window.

We also discussed lz4, zstd, lzma, lzma2, and xz as alternatives.

In the end, the group picked a compression mechanism that 1) had an RFC, 2) was a library that most platforms had by default, 3) provided good compression for large runs of 1s and 0s, and 4) was widely available as a development library in a variety of languages.

Given the response above, please raise issues on vc-bitstring-status-list that PING would like us to address before transitioning into CR.

npdoty commented 6 months ago

I've opened issues on the vc-bitstring-status-list repo: https://github.com/w3c/vc-bitstring-status-list/issues/created_by/npdoty Apologies for the delay.

msporny commented 6 months ago

I've opened issues on the vc-bitstring-status-list repo: https://github.com/w3c/vc-bitstring-status-list/issues/created_by/npdoty Apologies for the delay.

Thank you! We will start processing these next week. We were hoping to move Bitstring Status List to CR by mid-to-end-April 2024; please try to help us achieve that by engaging as we have discussions and raise PRs. We are treating all issues PING raised as needing to be addressed by the WG before we enter CR.

msporny commented 4 months ago

All issues raised by PING have been addressed via PRs that were merged into the specification:

https://github.com/w3c/vc-bitstring-status-list/issues?q=is%3Aissue+is%3Aclosed+label%3Aprivacy-needs-resolution

We will be requesting transition to CR next week, please be advised that W3M will ask for status for this issues shortly if it isn't closed. If we missed anything, please let us know so that we can address any further comments.

npdoty commented 4 months ago

Review is complete, so closing this.

I am following up on the issue resolutions very shortly.