w3c / json-ld-syntax

JSON-LD 1.1 Specification
https://w3c.github.io/json-ld-syntax/
Other
109 stars 38 forks source link

Privacy Considerations: Additional guidance on handling of remote contexts when forwarding documents #430

Open tesaguri opened 3 months ago

tesaguri commented 3 months ago

The Security Considerations and Privacy Considerations sections of the JSON-LD 1.1 Recommendation have the following guidances:

When processing JSON-LD documents, links to remote contexts and frames are typically followed automatically, resulting in the transfer of files without the explicit request of the user for each one. If remote contexts are served by third parties, it may allow them to gather usage patterns or similar information leading to privacy concerns. Specific implementations, such as the API defined in the JSON-LD 1.1 Processing Algorithms and API specification [JSON-LD11-API], may provide fine-grained mechanisms to control this behavior.

The retrieval of external contexts can expose the operation of a JSON-LD processor, allow intermediate nodes to fingerprint the client application through introspection of retrieved resources (see [fingerprinting-guidance]), and provide an opportunity for a man-in-the-middle attack. To protect against this, publishers should consider caching remote contexts for future use, or use the documentLoader to maintain a local version of such contexts.

However, caching would be ineffective against more aggressive tracking approaches that use different context URLs each time.

On the other hand, I think one of common use cases of JSON-LD processing is verification of signed documents that are forwared by different entities from the original issuers. Actually, many ActivityPub implementations, for example, usually treat JSON-LD documents as plain JSON documents, and perform JSON-LD processing algorithms only when handling Linked Data Signatures.

In such a case, I think it is advisable that the entity forwarding a document mitigate the risk of tracking by resolving "unknown" context URLs and replacing them with inline contexts or equivalent "well-known" context URLs before forwarding the document, so that recipients only need to trust the forwarding entity instead of both the original issuer and the forwarding entity with regard to tracking.

In ActivityPub's case specifically, this difference is significant because the forwarding entity is likely to know that you are processing the document anyway (either you have followed the forwarding entity or you have just made an HTTP request to the entity to fetch the document), while the original issuer isn't. I expect this is often the case for other use cases as well.

tesaguri commented 3 months ago

On second thought, it would have a big caveat for producing implementations that use plain-JSON canonicalizatiom mechanisms like JSON Canonicalization Scheme. Even though the Verifiable Credential specs recommends the RDF Dataset Canonicalization for users of JSON-LD (who might require selective disclosure of information in a secured document), that is not mandatory.

It might be possible for the forwarding entities to detect JCS and stop processing, but that feels quite ad-hoc and doesn't seem to be future-proof…