Closed awoie closed 1 year ago
Noting that the terms First Party
is not shared between the different VC data model specs. This isn't really an issue, but in general I think of the First party as the issuer, the holder as a UA, and the verifier as a third party so will refer to them within this context. This is the model that was used in the VC Data Model self review as well, but doesn't necessarily align with the specific definitions that are normally used within the browser and PING.
This spec acts as the core data model of a data model to share claims about a subject that have been asserted by the issuer. It works as the baseline layer of which other additional specs are extended on with additional features while keeping them all aligned with the general model of issuers assert claims and give them to the holder in a verifiable way so that the holder can present them to the verifier.
Correlation seemed to be the main problem in the previous privacy tracking issues that are labeled: [https://github.com/w3c/vc-data-model/issues?q=label%3Aprivacy-tracker+sort%3Aupdated-desc+is%3Aclosed]
Chosen profiles of claims format, claims semantics, proof format, or various other metadata defined as extension points like in section 5 can act as a mechanism to fingerprint the holder in a way that can reduce the K-anonymity even when selective disclosure mechanisms are used. Let's say a signature scheme or language used in claims is specific to a particular region but no location information is provided. Then it's possible for a verifier to infer the location of the subject not only based on who the issuer is, but also based on the selected format chosen by the issuer and/or holder for verifiable presentation generation. The Verifiable credential specification should encourage the reuse of common claims and proofs formats in order to reduce this metadata as a vector of correlation. This can either be highlighted via section 7.13, but would likely be better highlighted as an additional subsection of section 7.
On a related note, Who an issuer of a credential is can act as a vector of correlation about the subjects location or region or other PII and should also be considered. Issuer based correlation can be reduced through the use of common issuers rather than distributing across many issuers (e.g. think about issuing a drivers license credential at the state level rather than a separate issuer per county) or via cryptographic means such as a ring signature. An additional point should be added to the privacy considerations section to highlight this further.
Holder based tracking: Under the current design it's plausible that a generic holder software colloquially known as a wallet can syphon PII, credential usage, and other sensitive information about the subject in the same way that a Browser could with data like browsing history. This is semi highlighted by section 7.11 from the angle of storage, but not from the angle of processing. Given there's an inherent trust boundary between the subject and the holder that may need to be further defined here to match the UA model established between a browser and it's user where the holder is not a first party or a third party. The recommendations for this is to highlight the inherent trust boundary between the holder and subject to make it known that the holder software should be acting on behalf of the subject's interests.
The specification should note the impact of verifier's stored data being compromised in a way that could be harmful to the subject. As an example, there have been several laws recently passed that require that adult content websites gather proof of age via verifiable credentials (in some cases they're aiming at different formats such as ISO-18013-5 mobile driver's licenses but the general concept remains) which if compromised could violate the users rights to privacy. The specification should discourage the storage of verifiable credentials by parties other than the holder in order to avoid this data from being easily compromised. This is semi-highlighted in section 7.11 for holders, but does not highlight the impact legislation could have on the requirements of verifiers also being required to store this information. This should be further highlighted in section 7.11.
@kdenhartog and PING, thank you for your thoughtful review of the VCDM v2.0 specification. I see no reason why we cannot address each of the items you raise above in the specification text using the suggestions that you provide above.
The next step is to create an issue per item that you raise above in the VCDM repository and process each one separately as a PR. We will cc you, @kdenhartog, on each issue and PR to ensure that you are given a chance to review and provide feedback on the text that ends up going in the final specification.
@kdenhartog Is this the final review? Do we have to expect additional feedback from PING?
We will work on @kdenhartog review above and create issues accordingly.
Is this the final review? Do we have to expect additional feedback from PING?
PING is meeting today to discuss in about 40 minutes whether there's any additional points of consideration or if this is it. I'll add some comments here if there's anything else that needs to be listed.
We reviewed these points today during the PING call and there appeared to be consensus agreement these points should be added as privacy considerations sections to the specification and that would be the only aspects necessary to address this review. One additional point that was highlighted in #119 would be to utilize OHTTP more when needing to resolve contexts, status lists, credential schemas, etc from central servers to reduce IP based correlation.
We have conducted a self-review of our spec Verifiable Credentials Data Model v2.0, and the results can be found at https://github.com/w3c/vc-data-model/issues/1157.
Please check our findings.