w3cping / font-anti-fingerprinting

A system for preventing font fingerprinting
Other
14 stars 4 forks source link

Address issues around diversity of fonts on the web #3

Open asankah opened 4 years ago

asankah commented 4 years ago

There's a likelihood that this proposal may elevate popular web fonts to the status of common system fonts. This is a good thing for the reasons laid out in the explainer. But at the same time we should attempt to strike a balance so that we don't introduce strong negative incentives to introducing new fonts. This is implied in the explainer, but we should probably call it out explicitly, at least to the point where we monitor the diversity of web fonts via telemetry and act if we notice significant decline thereof.

hsivonen commented 4 years ago

More than just elevating particular fonts, it seems to elevate particular fonts as centrally hosted. Elevating centrally-hosted fonts relative to font self-hosting by each site may run counter to general anti-tracking goals of this proposal.

tabatkins commented 4 years ago

Can you elaborate on that, Henri? I'm not sure I understand.

hsivonen commented 4 years ago

The aggressive caching heuristic proposed appears to be by URL (as opposed to using telemetry to populate some form of storage that CSS local() would match against). Realistically, (outside perhaps massively popular sites like Facebook), the URLs that will be cached by the heuristic will be URLs that are common due to pointing to centralized font hosting systems. If centralized font hosting services have special caching privilege, Web authors might choose to use a service that enjoys such privilege instead of self-hosting their fonts.

Self-hosting fonts doesn't expose third-party tracking surface: It's the first party serving the font and the first party already knows that the document itself was accessed. Centralized font hosting services are opportunities for third-party tracking. Trying to defeat font fingerprinting is an attempt to defeat tracking. In that sense an attempt to defeat tracking giving an incentive to give a third party a tracking opportunity seems counter-productive.

(I'm not trying to debate whether a specific font hosting service promises not to track.)

tabatkins commented 4 years ago

Ah, k, yes. The idea is to intercept real URLs and give the aggressively local-cached version, so real pages "just work better". So your concern about this privileging centralized font hosting seems reasonable!

There is a mitigation that, once such a font does get promoted into the aggressive cache, the centralized service won't be able to track anymore - each user will only hit the service a single time. But in the meantime, when you're trying to get a font popular enough to enter the cache, there's definitely tracking potential there.

I think the only way around this is to link this into SRI as well, so that fonts get cached by URL and SRI hash, and you can access it with either. So you can self-host, and if you use SRI, you'll get the aggressive cache anyway. And if we count SRI hits as being equivalent to URL hits in the backend counting (proactively fetching the fonts at those URLs and computing their hash), you can get something into the cache even if it's only being locally hosted, if enough people are using it that way.

(We have to add SRI to url() still, I know, I know, that's on me probably.)

That still encourages centralizing, but doesn't require it. I don't think we can get around encouraging it if we want anything like this, tho.

hsivonen commented 4 years ago

So you can self-host, and if you use SRI, you'll get the aggressive cache anyway.

This would give a strong incentive against sites performing their own font subsetting. To benefit from SRI matching, sites would have to self-host exact copies of subset files as generated by a popular centralized service.

It seems to me that if one considers eager cross-site caching of the most popular Web fonts as a feature that browsers should have, the approach with the least problematic incentives would be the browser downloading fonts to a cache from which they would be matched by CSS local() even though they wouldn't be system-wide fonts but browser-managed browser-private cached fonts. (E.g. Google Fonts already includes local() in the CSS it generates, so this wouldn't require changes to Google Fonts to work.)

As for what incentive issues this has, this would probably cause incentives against proprietary fonts, since proprietors of proprietary fonts would probably be unhappy about browsers caching fonts and allowing unlicensed sites to get fonts applied from the cache. (The fonts would have to be downloaded from a vetted repository of fonts trusted by the browser both to avoid the licensing issue of the previous sentence and to avoid malicious poisoning of the cache with bogus fonts.)

There's also the problem of how to generate the list of popular fonts in a privacy-preserving way. However, arguably tallying font names is less problematic than tallying specific URLs.

annevk commented 4 years ago

Such SRI caching would defeat cache partitioning efforts (see https://github.com/whatwg/fetch/issues/904) and is therefore not workable.

tabatkins commented 4 years ago

It is extremely intentional that it violates the global cache partitioning, yes.

jyasskin commented 4 years ago

I've split the "how do we identify fonts" question into https://github.com/w3cping/font-anti-fingerprinting/issues/8, since it's independent of the diversity question that started this issue. And I appreciate @annevk for splitting the question of information leaks into #7.