Describe decoy entries as a privacy mitigation

npdoty commented 6 months ago

Could dummy entries in the status list help to protect against an attacker who just regularly scans the list to try to reidentify users or their status?

Spec should document resistance to statistical analyses.

msporny commented 6 months ago

Yes, and it would be worth saying something about that in the specification. We went into what one can do to avoid statistical attacks against group privacy during the presentation on bitstring status list yesterday:

https://meet.w3c-ccg.org/archives/w3c-ccg-weekly-2024-02-20.mp4

We should speak to decoy values and how one can avoid statistical attacks on the list by not only using decoy values, but flipping those bits randomly (when you flip other bits in the list).

msporny commented 5 months ago

PR #155 has been raised to address this issue. This issue will be closed once PR #155 has been merged.

msporny commented 5 months ago

PR #155 has been merged, closing.

npdoty commented 4 months ago

@dlongley and others have argued that there are privacy harms and no privacy benefits to decoy entries, and the group has suggested that it is unaware of any threat or use case where a decoy value could provide a privacy benefit.

To be clear, I don't think of this as an "obvious" solution at all and I'm not as familiar with these deployments. But I suspect there are threats where decoy values could provide some protection, especially when status value changes are rare, where the group of people with a particular status is small or where other information is known about people whose status may be on the list.

For example, an issuer very rarely suspends licenses for a particular behavior. A case documenting that behavior is widely published in the press and on the same day, the issuer updates the status list to indicate a suspended status. The press then report apparent confirmation that the suspect's license was suspended, and potentially other information about them (when it was last suspended or un-suspended), even if that information was intended to be kept confidential. If the status information was shared in cases of selective disclosure (the licensee had proved their license status in order to access sensitive content online), then the licensee's identity has also been disclosed, and the site learns the identity of the visitor who accessed that particular content.

@KDean-Dolphin raised a separate use case about business intelligence, where the size of the status list might reveal the group behavior, like how many licenses are being issued during a particular timeframe, that the issuer might wish to conceal.

dlongley commented 4 months ago

@npdoty,

I'm, of course, perfectly happy for us to say something about how decoy values can be helpful (even when already doing random assignment) if we're able to determine how and come to consensus on it.

I think we need to have a longer discussion around what to say about decoy values -- and that we should add an at-risk issue marker to the spec that says the working group will develop text around them (with options to recommend for, against, or stay silent on the concept). We could strike the sentence about discouraging them for now along with adding that risk marker.

Do you think doing this would allow us to proceed to CR and then we can continue the discussion around what to say in more depth at that point?

Regarding that discussion, I have a number of things to say around how using decoys in an effective manner requires that they behave as if they are indistinguishable from real entries, which I suspect will be quite challenging. Naive implementations that implement them in other ways (e.g., pseudo-randomly) would make them detectable as decoys, resulting in only net harm to privacy.

npdoty commented 4 months ago

Yes, I think noting it as an open question on how to do properly (or whether), with an at-risk marker, would make sense for CR.

msporny commented 4 months ago

PR #171 has been raised to note that the decoy guidance will be refined during the Candidate Recommendation process and that the group may, or may not, suggest that decoy values are good/bad/a mixed bag.

@npdoty, if that PR is merged, would it address your concerns enough to continue the transition into CR? If not, please suggest concrete changes on the PR such that we can determine the path forward. We'll most likely discuss this issue during our call this week, if you'd like to join us. /cc @brentzundel

w3c / vc-bitstring-status-list

Describe decoy entries as a privacy mitigation #150