w3c / reporting

Reporting API
https://w3c.github.io/reporting/
Other
75 stars 36 forks source link

Can we do anything mechanical about capability URLs? #221

Open clelland opened 3 years ago

clelland commented 3 years ago

At TPAC, the issue of capability URLs came up again, and how they are problematic with third-party reporting endpoints.

We should certainly resolve #155, and state the problem in the spec clearly and unambiguously, and discourage the use of third-party reporting endpoints on such URLs, but it might be that we can provide more useful machinery to site owners, so that they can make use of third-party reporting, but still ensure that their users are protected.

@yoavweiss suggested truncating long URLs, and/or using a hash of the URL in the report body. I suspect that this will make it much harder for some kinds of sites to use reporting; those where URLs are both long and dynamic may find themselves having to build infrastructure to record the hash of every URL generated by their site.

Another suggestion was to try to auto-scrub anything that looks like a session identifier or other capability-granting PII; however, it was pointed out that any heuristics we try to use to sanitize URLs in this way is guaranteed to be a never-ending cat-and-mouse game.

However, the idea of sanitizing URLs may not be completely hopeless. Site owners are the only ones in a position to know what components of their URLs might be harmful to expose, so it might be possible to allow them to configure reporting such that either individual components or entire URLs are replaced in the report body with placeholders.

A couple of strawman ideas:

  1. We could add an option to the Reporting-Endpoints configuration that simply replaces the URL with a fixed token, and/or a hash of the actual URL. Something like

    Reporting-Endpoints: collector="https://report.example/";report-as="password-change-page"

    This would replace the actual URL in the report body with the string "password-change-page". If needed, we could spec that a placeholder like "%s" in the token would be replaced with a secure hash of the original URL.

  2. We could use something like https://github.com/WICG/urlpattern/blob/master/explainer.md to defined named groups on the URL, and replace any groups named "redact*" with a placeholder. This might be overly complex to implement, and prone to misconfiguration, but could allow more flexibility, and the ability to report a more useful scrubbed URL in some cases.