Open pes10k opened 5 years ago
If the above approach is appealing, i would be happy to submit a PR to the existing level 3 standard, as well as the level 4 proposal.
If the above approach is appealing, i would be happy to submit a PR to the existing level 3 standard, as well as the level 4 proposal.
Appreciated, but please restrict that change to just CSS Fonts 4 which is the focus of current implementation. Errata can be gathered for Fonts 3, but there is no intention to back port all of Fonts 4 to Fonts 3. Instead, Fonts 4 is gradually replacing Fonts 3.
I see. Is there an expected timeline for Fonts 4? If it's a far ways off, then possibly valuable to push out a 3.1 for security and privacy purposes (e.g. not all of font 4, but the things where the current spec is being leveraged to harm users)?
All browsers are implementing both the Variable fonts and the Color Fonts parts of Fonts 4, plus smaller changes (like font-weight being a number in the range 1 to 999 rather than being a set of number-like tokens 100, 200 etc).
So this is being used now.
Okie dokie, sounds good. Would a strict, enumerated set of font faces that can act as system fonts be preferred, or would a broader phrase like "fonts provided by the platform by default, and not installed by the platform's user" suffice?
I don't think we want to list the specific fonts in the spec. The general rule should be that the list of fonts shouldn't provide any more information than can be obtained by other means: e.g., by the combination of browser & OS & preferred language. (I don't know about Mac, but Windows has language specific fonts that are included in the OS but not installed by default unless the user actually uses that language in the OS.)
I would also hope that there would be some exclusion/option for supporting a wider set of fonts for trusted sites.
For the PR: Fonts Level 4 already has a section on Preinstalled Fonts vs User-Installed Fonts, which currently says:
User Agents may choose to ignore User-Installed Fonts for the purpose of the Font Matching Algorithm.
So, the request here is to upgrade that "may" into a "should".
This should probably affect local()
references in @font-face
, as well as font-family
matching. Otherwise, the fingerprinting techniques could be changed to compare a font-face defined as src: local(test-name), url(reference.woff);
, where the reference file has a characteristic size that will differ from the true font of that name. (Unfortunately, this means that periodically downloading & installing the most popular Google Fonts will no longer save me on data!)
Next step: work on an API for full access to all installed fonts as a list! (With an explicit permission prompt, of course, which would also allow those fonts to be used for rendering.) This is essential for document-editing web apps to replace their native versions. Some apps still use Flash just to get this data.
Hi @AmeliaBR
Thanks for the comments. A couple of comments:
So, the request here is to upgrade that "may" into a "should".
I think must (i.e. "User Agents ~may choose to~ must ignore…") would be the right word. Correctly implementing the standard should make impossible the kinds of privacy violations the current version enables. Similarly, standards should strictly protect user privacy, at least until there is some signal (permission, etc) saying the user granted the site greater privileges.
Re: local()
thats all great points! I wouldn't have thought of that, but that all seems terrific! Thanks for catching my goof :)
Re: permissions: I don't have a strong sense about this (other than that permissions discussions often rounding down to "users don't like permissions, so just grant access by default". As long as things don't wind up there!). But for the use case you mentioned, maybe a better norm to push for would be a service worker + site hosted fonts?
I don't think a must is viable here without a better solution for addressing language support.
Many languages aren't supported in the default fonts installed on a given operating system. In many cases users can then install fonts that support more languages by choosing to install support for those languages. Presumably the requirement being proposed here would allow web use of all of the default fonts for all languages -- which in turn still exposes a good bit of fingerprinting data (which languages the user has installed fonts for) -- but I think there are still significant languages that those defaults don't cover (with significant variation between operating systems). (It also wouldn't surprise me if the fonts installed on Android devices vary based on carrier/market and aren't consistent within a language, though I'd be happy to be wrong.)
So there's a tradeoff here between one of many active fingerprinting vectors and support for significant numbers of the world's languages. Without clear data that fixing just a part of this active fingerprinting vector (still allowing fingerprinting of which languages are supported by fonts on the system) would make a real dent in ability to do active fingerprinting on the web (which is much easier than passive fingerprinting) -- data that would probably require a project to gather a list of fingerprinting vectors available on the web (with entropy for each item) -- I don't think there's a very clear case for degrading the support for many minority languages on the Web.
1) Fingerprinting doesn't get solved until you start solving it :) saying "this isn't the worst vector, so lets not fix" seems like a sure fire way to make sure fingerprinting never gets better
2) font based finger printing actually is one of the worst FP methods though! See the Panopticlick paper / project linked above, the Beauty and the beast: Diverting modern web browsers to build unique browser fingerprints paper, and many others (happy to provide links if you like). They all find the same thing: fonts are hugely identifying if you have anything but the default configuration (put differently: if you allow non-system fonts to be used, it will be hugely identifying in the cases where its useful, and not useful in the cases its not identifying)
3) You might consider the statement from PING regarding meta-standards for standards (e.g. ways to fix privacy in web standards). From the third section of the recent PING blog post, Privacy Anti-Patterns In Standards, there being bigger problems elsewhere doesn't obviate the need for standards to address the privacy harm they introduce. (note: I wrote it, but it states the position of the IG)
4) I think @AmeliaBR has the exactly right idea: fonts should give no more information away than "browser & OS & preferred language". So no argument against making the system fonts "the non-user installed fonts for the current system language." So not "all fonts for all languages", but something narrower than that. Would that address the concern?
many people see passive fingerprinting as not solvable given the web's API surface. Refuting that requires gathering the data from the various sources to see what the state of things is (see below), not just hoping. Agreeing to solve it needs to be a wide consensus, not a bunch of ad hoc and inconsistent decisions made in different working groups to different standards.
many of the papers used flash-based font data, which is much more identifying since it's an ordered list of fonts, not just a set. That's why I'm suggesting that the convincing thing is to maintain a common repository of the state of fingerprinting rather than point to a bunch of papers all/most of which are seriously out of date in various ways, and all of which are incomplete.
(sorry, need a (3) here for consistent numbering)
many users use an OS and browser whose UI doesn't match their preferred language
Oh, I guess I should respond to 3, actually: The Privacy IG isn't the right forum for making tradeoffs between Privacy and other issues; it's going to have an obvious bias. You'd probably get a very different result on a privacy vs. internationalization tradeoff in the Internationalization WG.
Safari, too, has different fonts for different internationalizations. My strategy when implementing this fingerprinting mitigation in Safari wasn't to treat every user the same; that would have made many of our users' lives worse. Instead, my goal was to limit the number of equivalence classes a user could fall into. Before the mitigation, a user could be in a class of one, thereby being uniquely identified. After the mitigation, there are still multiple equivalence classes, but there are only a handful. Each equivalence class has many, many users, thereby significantly reducing the number of bits of entropy.
Next step: work on an API for full access to all installed fonts as a list!
I would formally object to such an API. It explicitly undoes all the font-based privacy mitigations we've done. Users don't want to see more dialog boxes, and trying to explain the privacy implications of using fonts to a user is difficult. If a website wants to use fancy fonts, it can serve them as web fonts.
Based on @litherum comments, my two cents here is that we should instead do the following:
User Agents must limit the exposure of system fonts to protect user privacy. The exact mechanism through which this is done is left at the discretion of User Agents.
To achieve this, User Agents should collect telemetry about fonts supported by their users. One way to prevent installed fonts to leak information about the user would be to cross-reference this telemetry data with their installed languages and operating system version, and not expose to the web the fonts which are not commonly supported in any of the [ OS-Version x Installed Language ] buckets that the user is part of.
Agreeing to solve [passive fingerprinting] needs to be a wide consensus, not a bunch of ad hoc and inconsistent decisions made in different working groups to different standards.
David, I think I have the opposite expectations about what approach we should take. I believe that the only practical way to address passive fingerprinting is standard-by-standard and implementation-by-implementation doing the in-the-weeds work to ensure that passive fingerprinting surface isn't exposed.
But I definitely agree with you about the need for broad consensus. Good news, though: that one's already taken care of! People almost-universally agree that they don't want to be silently tracked across the web. It's not just consensus, it's basically unanimous. Now it's our job to implement that for everyone.
@dbaron wrote:
You'd probably get a very different result on a privacy vs. internationalization tradeoff in the Internationalization WG.
Similarly, people coming from a performance optimization perspective (which has substantial, real-word implications especially for those with slow network connections or with pricey, metered bandwidth) would take a different perspective if told "there is no need to download this font since you have it already, but we are going to forbid the browser to say so, and thus force you to download it every time, to enhance your privacy".
There is not consensus that active fingerprinting is solvable to the point that there won't still be large numbers of unique users; I've seen a number of chrome implementors and tech leads take the position that it is not in discussions on fingerprinting, including in this working group and elsewhere. I'm not convinced either way as to whether it's solvable because I haven't seen anybody put together the data (an up-to-date list of fingerprinting vectors, with data on them and proposed mitigations) that would let me make that judgment.
(edit: fixed typo where I wrote passive when I meant active)
@dbaron not sure what the suggestion is here. Freeze progress on CSS Font v4 until a "up-to-date list of fingerprinting vectors, with data on them and proposed mitigations" is built? It definitely does not seem user serving to say "we know there is a problem, we know its significant, but haven't had others propose mitigations for them, so we're going to ship the problem anyway".
Seems way better to fix a problem that we know exists now, and is harming users today. This isn't hypothetical; the current CSS Font v3 spec enables users to be tracked w/o their consent.
As stated before, there are many, many research papers showing this is a problem, as well as many deployed examples in the wild. It is not the case that these papers find no problem in the absence of flash, the findings are either "not having flash degrades identifiability some, but its still identifying" or "we measured w/o flash, and find its highly identifying." It's also apparently serious enough that FF and Safari have deployed mitigations.
@litherum can you say more about Safari's algorithm? How different is it from anonymity sets of "browser & OS & preferred language"? I'm not married to the specific mitigation in the issue text, as long as the standard includes a fix for the problem. Maybe Safari's approach is the way to go!
In terms of the efficacy of font fingerprinting / entropy it exposes, this paper from INRIA is fairly interesting/helpful. They conducted a real world study of fingerprinting, including Javascript based font probing, and found that fonts were one of the top contributors to fingerprint-ability.
One additional thoughts: a concern has been raised about the performance impact of downloading fonts; there's an interesting performance impact of JS based font fingerprinting -- it takes time/resources for a fingerprinting script to iterate through fonts to determine what a user has installed (for example fingerprintjs2 has searching for an extended list of fonts as an defaulted-off option because of the performance impact of doing so).
@jasonanovak I don't think you can fairly compare performance impacts from malicious pages with performance impacts on normal usage. Blocking one fingerprinting script might just provoke the spyware to use another fingerprinting method with even worse performance impacts.
If the primary concern was the performance impact of the current methods for figuring out which fonts a user has, the solution would be to create a proper API for doing so.
Agreed — I'm not a fan of a calculus which concludes that fixing a common fingerprinting method has a performance cost on the basis that sites might decide to use a less-performant fingerprinting method instead.
The performance cost of the fix is that people would end up downloading web fonts that they don't actually need (because they already have the font installed on their system).
E.g., I have most common Google Fonts installed, and one of the reasons I did that was to cut down on web font downloads. If we prevent browsers from using those custom installed fonts, there will be a performance cost to me (more data usage and slower page loading) when visiting sites that use these fonts.
How many people this will affect, and to what degree, I can't say. Some browsers give users the option to turn off web font downloads altogether, which would negate the performance impact but increase the impact on user experience. E.g., turning off web fonts might not be a good solution for people whose pre-installed system fonts don't offer a lot of choice for the languages/scripts they use.
The performance impact of malicious scripts is a separate issue altogether. I was using the example of switching fingerprint methods to emphasize that we can't expect that fixing the fingerprinting vector will have a net performance benefit on malicious sites. Malicious sites generally don't care about user data plans.
if there is a plan to introduce a local-font
permission with font-table-access
(https://github.com/inexorabletash/font-table-access/#privacy-and-security-considerations), then there is no need to allow non-standard system fonts by default
The CSS Working Group just discussed mitigations for font based fingerprinting
.
The CSS Working Group just discussed mitigations for font based fingerprinting
.
The performance cost of the fix is that people would end up downloading web fonts that they don't actually need (because they already have the font installed on their system).
E.g., I have most common Google Fonts installed, and one of the reasons I did that was to cut down on web font downloads. If we prevent browsers from using those custom installed fonts, there will be a performance cost to me (more data usage and slower page loading) when visiting sites that use these fonts.
How many people this will affect, and to what degree, I can't say. Some browsers give users the option to turn off web font downloads altogether, which would negate the performance impact but increase the impact on user experience. E.g., turning off web fonts might not be a good solution for people whose pre-installed system fonts don't offer a lot of choice for the languages/scripts they use.
I think it would be useful to know how many people have separately installed many web fonts onto their systems and would get this bandwidth-reduction benefit. It looks like SkyFonts provides a service for that (including citing bandwidth benefits), but it's not really emphasized on the Google Fonts site itself, for example.
But couldn't browsers provide that performance benefit by caching web fonts? It doesn't have to be system-installed, a site can refer to a web font and if the browser has it cached, then the user doesn't incur the bandwidth cost; Google Fonts are typically cached for one year. There are potential privacy implications regarding timing attacks on cached resources as well, but they're not nearly as easy or expansive as accessing the list of fonts, which (sorry to repeat the point) is one of the highest entropy fingerprinting sources available (in the top 3 to 4, depending on some details like platform or the particular dataset).
But couldn't browsers provide that performance benefit by caching web fonts?
Some benefit, for repeat visits to the same website. For visits to different sites, browsers are switching to a model where the cache of 3rd party resources gets partitioned by the site making the request (to avoid security issues where sites could guess at your browsing history by timing how long it takes to download a resource from that domain). Even without that security enhancement, cross-site caching fails if the site has done anything unique re subsetting the font.
Glad to hear this was discussed in TPAC. However, i couldn't tell from the IRC notes above what the group decided on for next steps. PINGs objection is still the same, that the privacy harm enabled by the spec has demonstrated "in the wild" harm, and so need some solution in the spec.
What I took away from the IRC conversation is that the group needs further data to decide the correct mitigation. Is this correct? If so, do ya'll have a plan for getting that data? Happy to support that effort if possible.
I'm confused by the "Needs Design / Proposal" label though. https://github.com/w3c/csswg-drafts/issues/4055#issuecomment-505279789 is a concrete proposal, no?
This will break websites with user-generated content in minority scripts. Maybe browsers should be encouraged to ask users, upon first going to a site that requests a certain installed font, whether to permanently allow that site access to that font, to minimize the disruption.
@dscorbett can you explain more? These would be sites that expect the visitor to have a non OS provided font, don't have a useful / useable fallback, and don't include / web-font the font they want to use? Can you send some example links?
These are websites that break in all WebKit browsers currently then? And breaks under the suggestion in https://github.com/w3c/csswg-drafts/issues/4055#issuecomment-505279789?
https://www.facebook.com/RohingyaLanguageAcademy/ includes some user-generated content in the Hanifi Rohingya script. Facebook doesn’t distribute a Hanifi Rohingya font, but I can see the text because I have Noto Sans Hanifi Rohingya installed. If the browser skipped that font because it is not a default system font, no one would be able to see the text.
That text is visible in Safari. Have I misunderstood this proposal?
*not all WebKit browsers. Only Safari blocks these user-installed fonts. Regular web views in 3rd party apps need to continue honoring these fonts because it’s common for apps themselves to “install” fonts for the current process and use web content as UI, which should get the font.
So I just had a discussion with @jschuh to figure out some further details beyond what I discussed during TPAC. Here's our (Chrome's) current thoughts on the matter:
For the explicit enumeration of local fonts API, we want to only expose system fonts by default.
To support the middle ground of non-trusted sites that still want font data from local fonts, we want an API (<input type=font>
, or a DOM API call that pops up a font chooser) that lets the user explicitly choose a single font to expose. (This is akin to <input type=file>
; enumerating the user's filesystem is clearly terrible, but letting the user affirmatively provide a single file is clearly totally OK.) We'll pursue this separately as another spec proposal.
For more general font usage, such as in 'font-family', we are not interested in locking down access to only webfonts and local system fonts; there are important usability and a11y concerns, as expressed in this thread and in the TPAC discussions, for allowing pages to use local fonts beyond the system ones.
However, to deal with bad actors using this access as a back-door for font enumeration/fingerprinting, we're actively working on a Privacy Budget system, in which "how many fonts is this page accessing" will be counted (among many other things). A page cycling thru a large number of fonts will burn thru their budget quickly, at which point further fingerprintable APIs will stop working (or become very noisy/generic) until the user gives explicit permission to continue.
Privacy Budget satisfies my concerns, expressed during TPAC, that trying to solve privacy issues with one-off restrictions won't work, and even with coordinated efforts across the web platform you'd need harmfully-draconian restrictions to have even a chance of protecting privacy. We believe the Privacy Budget framework is the correct way to address fingerprinting concerns going forward.
@litherum
If a website wants to use fancy fonts, it can serve them as web fonts.
For CJK-like fonts, there is currently no better way to make them a universal web font. Because these fonts are very large, a font in ttf format usually exceeds 10MB. The current solution is font subsetting, but it has a lot of limitations.
re @tabatkins and privacy budget
I haven't see a standard for it, any specifics of thresholds or empirical observations of it being a useful privacy protection strategy. Further, since a unique font generally puts someone in an extremely small equiv class by itself (w/o needing to be combined with other inputs), its unclear how a privacy budget approach would be useful here.
Put differently, users are being harmed today by this flaw in the font standard. It seems inappropriate to hinge the solution to that problem to something that isn't anywhere close to standardization (i.e. privacy budget).
re @yisibl (https://github.com/yisibl) could the privacy harm be addressed by solving the problems related to font subsetting?
Re @litherum @dscorbett Would be interested to know how Safari handles these cases, as it seems the best (only?) proposal on the table currently is to do what Safari does.
In general, again, I (and I dont think anyone on PING) is wedded to any particular mitigation, only that there is a deep privacy harming flaw in the current spec that needs fixing. Would be very happy to work with the WG to come up with other options, if the Safari option doesn't work. But some solution needs to be found (keeping in mind that privacy budget does not seem to be a solution to this problem).
For CJK-like fonts, there is currently no better way to make them a universal web font. Because these fonts are very large,
This seems not particularly relevant to whether it's practical for browsers to block the visibility of user-installed fonts to the Web, because CJK fonts are included in the default install of all popular operating systems these days.
@hsivonen Not all OS have good quality CJK fonts installed, currently only HeiTi have good quality fonts broadly available among all major OS. So if you need other high-quality CJK typeface (like Song, Fangsong, Kai, etc.) support, the web page authors may rely on user-installed fonts, for example the fonts which available with MS-Office installation.
There are also some web apps in business (eg. tax software) in China require special fonts to be installed and used in their pages.
@hax would it suffice to have a browser setting (defaulting to off) to enable this?
(distinct from a per-page permission, for the reasons mentioned in https://github.com/w3c/csswg-drafts/issues/4055#issuecomment-505281057)? This would be similar to the do-not-track setting defined in that standard, but defaulting to off instead of on.
@hax also, can you clarify what happens on these sites when you visit them in Safari, on a local install of OSX? Do they work correctly in Safari b/c OSX installs a category of fonts that (for example) Windows doesn't? Or that these sites don't support Safari / users w/ default fonts?
Would another option be to just have Microsoft systems include the common office fonts as the set of system fonts they expose (since the number of office users is likely large enough to preserve useful equivalence classes)
I am not against reducing the impact of fonts on fingerprints, but Safari's approach is arbitrary. Even if a user installs a high-quality CJK font, it cannot specify it via CSS. As a result, we can only use Web fonts, but CJK Web font faces many problems.
The Chrome team offers a lot of advice, and we should go in this direction instead of killing the Web's creativity.
In Safari 12.1.1, whether I use local()
to specify a font or directly set font-family to SourceHanSerifCN-Light
I can't use my own installed CJK font.
I tried two ways to enable locally installed fonts:
font-family: "Source Han Serif CN";
(思源宋体)local()
Either way, this font cannot be enabled.
@font-face {
font-family: "$";
src: local("SourceHanSerifCN-Light");
}
.test1 {
font-family: "Source Han Serif CN", "PingFangSC-Regular";
}
.test2 {
font-family: "$", "PingFangSC-Regular";
}
Demo: https://codepen.io/yisi/pen/OJLGoxj
@snyderp
Because CJK fonts are always big issues from the first day of internet, some front-end developers may specify a complex font settings to utilize the best quality fonts which may available on user's computer. For example, they may use font-family: Source Han Sans, Source Han Sans SC, Source Han Sans CN, Noto Sans CJK, Noto Sans CJK SC, Hiragino Sans GB, Lantinghei SC, Microsoft Yahei, HYQihei, PingFang SC, STXihei, WenQuanYi Micro Hei
. This list include many HeiTi fonts from different source --- open source fonts, additional fonts of widely used software (like Office), popular commercial fonts, etc. A simple assumption behind such strategy is, if user buy/install a font, it's very likely they want to use this font as default font.
If you ask what will happen if users upgrade to mojave... Luckily, CJK HeiTi fonts in OSX/iOS are good start from 2015, so fallback to PingFang SC seems not too bad.
But there may be still many cases which will face problems. For example, I used to see some mobile devices / apps provide many CJK fonts for downloading/installing as important feature. It would be unacceptable if the users can not use these fonts in their browsers/webapps.
Or that these sites don't support Safari / users w/ default fonts?
In the past, there were many sites/webapps only test Windows in China market. As developers, we try our best to make web pages/apps compatible with all platforms, but there are things which out of our control. For example, tax software need special fonts installed for legal compliance. We really need solution for such cases. And the solution should be good enough, or we will eventually go to the instruction like "don't use Safari, or new Chrome, pls use XXX, YYY, ZZZ browsers (which based on old versions of chromium)" 😂
Currently I can not tell whether a browser setting (or any method) is ok or not for each use cases. I just want to provide some background which I believe need to put into consideration.
@hax Thank you!
Not all OS have good quality CJK fonts installed, currently only HeiTi have good quality fonts broadly available among all major OS. So if you need other high-quality CJK typeface (like Song, Fangsong, Kai, etc.) support, the web page authors may rely on user-installed fonts, for example the fonts which available with MS-Office installation.
From the privacy pespective, it's problematic that for some systems, there isn't a single font bundle. E.g. an en-US install of Windows 10 does have fonts for Chinese but not the ones you mention, but you don't need to install Office: AFAICT, adding the Simplified Chinese IME to available text input methods adds the fonts DengXian, FangSong, KaiTi, and SimHei.
Adding the Japanese and Traditional Chinese IMEs similarly expands the set of fonts even though the en-US base install already has coverage. (And indeed, for Japanese, the base set is gothic-only with no mincho!)
Sites that involve text input can pretty easily figure out what IME a user is using, so in that sense having the font list correlate with IME doesn't give away more information, but when there's no text input on a site or when the user has added IMEs to the menu but isn't currently using them, being able to detect the full set of IMEs the user keeps available is bad. I don't know how to solve this unless Microsoft changes its disk space vs. privacy considerations when deciding how this stuff works, but as long as browsers expose whatever user-installed fonts to the Web, Microsoft has no incentive to change the privacy properties of the system font configurations.
Some privacy can be traded away for typographic quality by not blocking any font that is bundled with Windows even if the font isn't guaranteed to be present in all configurations of Windows.
(I'd expect a Korean font subsetted to the KS X 1001 set of modern-use syllables to be of reasonable size as WOFF2, so I expect the constraints for site-provided fonts for Korean to be different from Chinese and Japanese.)
A simple assumption behind such strategy is, if user buy/install a font, it's very likely they want to use this font as default font.
Surely that bit of user intent can be seen from the user actually taking action to change the browser font prefs in addition to just installing the font.
Surely that bit of user intent can be seen from the user actually taking action to change the browser font prefs in addition to just installing the font.
If I understand correctly, if browsers allow user set installed fonts as default fonts, it still could be utilize as fingerprinting 😂
Essentially, users may want to install and use fonts in web platform for various reasons, like a11y, business requirements, legal compliance, political position, aesthetics or just highlighting personality. It's a hard problem that how to make tradeoff between privacy and user rights of choice.
If I understand correctly, if browsers allow user set installed fonts as default fonts, it still could be utilize as fingerprinting
Of course.
Essentially, users may want to install and use fonts in web platform for various reasons, like a11y, business requirements, legal compliance, political position, aesthetics or just highlighting personality. It's a hard problem that how to make tradeoff between privacy and user rights of choice.
If users are given choice, we can't protect users who use the ability to make choices from being fingerprinted on those choices.
However, at present there's the problem that people who make no browser configuration changes still get fingerprinted on their non-Web uses of their computer. I think it's worthwhile to protect users who don't change browser font prefs from being fingerprinted on what fonts they've installed for other things that they do on their computer.
If I understand correctly, if browsers allow user set installed fonts as default fonts, it still could be utilize as fingerprinting 😂
This is correct, and why this issue exists. It would be great to have more suggestions for how to solve the problem, instead of privacy ¯\_(ツ)_/¯
(I don't mean you specifically, in anyway, but the above thread is heavy on attacking a single suggestion, and light on the WG suggesting solutions to a problem in the WG's spec)
The "browser font prefs" suggestion is not appealing to me (I mean it mostly as a straw man), but the feedback i'm hearing from the WG is that for some subset of users, in some locals, fingerprinting is unavoidable. I'm not ready to throw the towel in yet, but its just there to say "here is at least one option for having the standard be privacy preserving by default, instead of privacy harming by default".
@snyderp We all agree privacy is very important. I think no one want to "be heavy on attacking a single suggestion" in the whole thread. But as @yisibl and I point out, current Safari solution have bigger impact on CJK users than others because the alternative workaround (use webfont) have much bigger cost/difficulties for CJK fonts (and considering other factor like partition cache, such cost will be even bigger).
@hsivonen
If users are given choice, we can't protect users who use the ability to make choices from being fingerprinted on those choices.
I believe that's the problem the working groups (CSS WG + privacy WG + other related WG like i18n, etc.) need to working together and figure out.
I haven't see a standard for it, any specifics of thresholds or empirical observations of it being a useful privacy protection strategy.
I assume you read the explainer I linked? Note that this is also something we're actively working on and developing; it's far from complete so far.
Further, since a unique font generally puts someone in an extremely small equiv class by itself (w/o needing to be combined with other inputs), its unclear how a privacy budget approach would be useful here.
A single unique font probably does, yeah. How do you expect a website to find that single unique font that the user has? If it's highly identifying, that means only a small number of people have it. So either the website is only targetting those handful of people and is thus testing only for that font (interesting case...) or they're testing lots of "unique" fonts to see which small bucket the user falls in. The latter is exactly what the Privacy Budget approach is intended to detect - spamming hundreds or thousands of local font requests looking for the one that highly identifies the user.
Put differently, users are being harmed today by this flaw in the font standard. It seems inappropriate to hinge the solution to that problem to something that isn't anywhere close to standardization (i.e. privacy budget).
And as others have argued in this thread, users will be harmed by the suggestion to restrict local font access to solely system fonts. (And aren't currently harmed by Safari's actions due to the differences in user demographics between browsers.) We need to think about the balance of benefits, harms, and costs of mitigating those harms.
As I argued in TPAC, and Chris Wilson and others at Google argued in their response to PING's charter discussion, the web is chock full of data that can be used for fingerprinting. Any attempt to reduce that, particularly any attempt with significant user-harmful side effects, needs to show that it'll actually reduce the fingerprinting surface to a usefully low level; going from 400 bits to 40 bits of identifying information achieves precisely nothing, since you only need 33 bits to uniquely identify every person on Earth. (And you really want to allow less than 20 bits, to ensure that people are "bucketed" together with at least several thousand others.)
If the PING can show that the sum of their suggested mitigations will reduce fingerprinting surface to 20 bits or less, or at least that there's a believeable path to getting under that limit, and that performing all of those mitigations will not harm the web to such an extent that the attack surface just moves elsewhere (such as sites moving to native apps...), then great! That would be an ideal solution, because reducing information wholesale is typically far easier than trying to be clever!
So far, the PING hasn't attempted to show that it's possible to do that. And so far, Chrome's security engineers don't believe it's possible to reasonably do an absolute fingerprinting reduction, either. Thus Privacy Budget, our attempt to dynamically enforce a pay-as-you-go budget that, hopefully, will let us prevent attacks (like scanning the user's local fonts) without harming legitimate uses (like using a handful of local fonts to actually render text).
I think you should do more than dismiss Privacy Budget out-of-hand; it's a serious effort to actually solve fingerprinting across the entire web platform, not an attempt to deflect attention. The math is clear here: this isn't a problem that can be solved with band-aids, and even knowing if your efforts will achieve anything at all requires a serious analysis of the whole attack surface; standard defense-in-depth security intuitions don't apply, at least not with the current state of things.
So, as Chris Wilson said, without a formal model showing that this change is part of a combined effort that will achieve a useful result, Chrome will continue to be against it, and will instead pursue methods like I described to achieve useful fingerprinting reduction. Harming users and webdevs for what is currently just a fig-leaf is not something we're interested in.
If users are given choice, we can't protect users who use the ability to make choices from being fingerprinted on those choices.
I believe that's the problem the working groups (CSS WG + privacy WG + other related WG like i18n, etc.) need to working together and figure out.
For clarity: I'm not suggesting taking away choice from users. That exercising choice makes a user fingerprintable is just a fact and there's nothing for WGs to work out about it. (I think most of this issue is probably not a standard-setting one but a browser product decision one.)
What I see as a problem is that what I believe to be substantial populations of users who don't need to exercise such choice and could be protected are not. It's not particularly nice to know that there are users who are cannot be protected, but that's not a good reason not to protect the substantial user population who could be protected.
Consider the following types of users:
@font-face
.(The taxonomy is simplified: The most notable complication is the one seen upthread on Windows 10 with Chinese and Japanese: That there are fonts that are bundled with the system and that are conditionally enabled. For example, for someone in Japan, having the conditionally-enabled Japanese fonts enumerable probably isn't a substantial fingerprinting vector. For someone in Europe who has a Japanese IME in the text input menu, they are. For the purpose of the below paragraphs, I'm hand-waving conditionally-enabled system fonts into group 1.)
Users in group 1 need no protection mechanisms compared to status quo. Evidently users in group 4 can change browser prefs and could uncheck whatever "don't expose user-installed fonts to the Web" checkbox to opt out of protection.
Browsers cannot protect users in group 6 without developing all-encompassing font download mechanisms as part of the browser.
Groups 2 and 3 could be protected but aren't. As a user in group 3 (previously in group 4), I'm unhappy that I'm not made indistinguishable from group 1. It should be within technical feasibility to do so without breaking use cases for groups 5, 6, and 7, but the details do need careful thought. In particular, it would be good to know what language communities are in group 5 and with what details (e.g. in the context of particular operating systems only or out of habit despite operating system font repertoire having improved). Group 4, as noted, will manage.
Font based finger printing is a common, privacy violating pattern, where websites build semi-identifiers based on uncommon fonts a user has installed. This semi-identifier is then combined with other semi-unique-identifiers (hardware configuration, user configuration, viewport size, etc) to build highly identifying values, used for tracking users.
Examples
Panopticlick includes a well know demonstration of how this can be done: https://panopticlick.eff.org
Fingerprint2.js is a popular library that uses font-based fingerprinting (among other signals) to identify users
Some browsers provide some defenses against this privacy violation. Safari, for example, only reports the default system fonts through Safari, and will not use other, uncommon fonts, even if they're installed on the OS. Firefox provides a similar option.
The standard should be modified to protect against / not allow font-based fingerprinting by default, instead of relying on non-standardized, vendor specific mitigations.
Suggested Mitigation I suggest having the standard follow Safari's approach, and requiring browsers to only treat the default fonts on the platform as system fonts. A simple (though maybe not the best / most elegant) way of doing this would be to modify section 5.2 in "CSS Fonts Module Level 3" to modify the system font fallback procedure to only return the default platform fonts. Those might be specified per platform, or just as this list: http://www.ampsoft.net/webdesign-l/WindowsMacFonts.html