Closed ben-allen closed 7 months ago
@dminor @hsivonen Would love to hear if this seems like a reasonable strategy to you!
I've made substantial revisions to this proposal, which are reflected in the new explainer. I'd love to hear your feedback! @dminor @hsivonen
Hi Ben, we're discussing this internally and we hope to get you feedback soon.
Here's the slideset from a talk at TG2 on the proposal as stands.
Sorry about the delay. Here's my review:
Proposed position: negative.
The use cases have legitimacy, but it's not clear that the importance of the use cases overrides other concerns: primarily fingerprintability and, secondarily, reconciling in implementation the relative role of browser and operating system given that the browser language may not be coupled with the OS language and that operating systems do not consistently provide UI surface for these settings. Third, if we were to expose this information, we should consider if a HTML+CSS-based declarative solution makes more sense, particularly for numbering systems, hour cycle, dates rendered according to a calendar, and amounts with units.
Additional Notes:
It's a welcome improvement that the README now documents estimated population for default combinations of fw
, hc
, and mu
. We see that the population for some of the combinations is very small. However, we still don't see anonymity set estimates for the cases of concern: where the fw
/hc
/mu
combination overrides the locale default.
The relative importance and implementability of the various aspects of the proposal would be easier to assess if the proposal documented how operating systems currently deal with these settings. The README says: "In the native environment these problems are easily solved, since users can specify their preferences in their system settings.", but this does not appear to apply across all mainstream operating systems. As far as I can tell, h12 vs. h23 hour cycle setting is the only one of these that's consistently available across different mainstream operating systems (Windows 10/11, macOS, Gnome (Ubuntu), Android, iOS/iPadOS; I don't have sufficiently recent Chrome OS at hand). Furthermore, it's non-trivial to check what's supported where: Whether UI for choosing a numbering system is shown depends on the system language on Apple platforms and calendar choices seem to depend on the system language on Windows 10. On Apple platforms, the system-wide calendar setting doesn't support all CLDR calendars. The system-wide language-independent setting an Apple platforms supports only three calendars that only differ by year number and era designation (i.e. they share Gregorian/ISO month and day): Gregorian, Japanese, which is not the CLDR-primary calendar for Japan, and Buddhist, which is the CLDR-primary calendar for Thailand. Do some language/region choices unlock more calendars system-wide (as opposed to, reportedly, within the Calendar app)?
Both Microsoft and Apple have redesigned their system preference UIs for this area in the recent years (post-Windows 7). Is it known what decisions Microsoft and Apple took based on experience from previous designs? Have they shared characterizations of how and how much users change these preferences (if there is telemetry)?
The idea of bundling fw
, hc
, and mu
makes a lot of sense from the fingerprinting perspective. However, combinations other than applying European settings to en-US don’t seem to work nicely in a fingerprinting-resisting but “Just Do What I Mean” way. Some settings are more confusing than others if set to unexpected values. In particular, fw
set to an unexpected value can cause bad mistakes with e.g. travel booking. European users have encountered the issue of English-language sites showing Sunday-starting weeks, but U.S. users may not be on guard for the opposite failure mode. If a U.S. user simply wants to opt into 24-hour clock, if this choice comes bundled with making fw
Monday, this might hurt more than the 24-hour preference helps. (Ideally, the travel booking problem would be avoided by sites using browser-supplied date pickers. Unfortunately, sites really like to make their own.)
If fw
is dropped as too complicated, it’s relevant to ask if it’s really necessary to broadcast the temperature unit. People can have a per-site cookie-persisted setting for the weather site they routinely use, and removing the annoyance to have to use a site-supplied unit switcher while traveling doesn’t seem like enough of a problem to justify broadcasting a fingerprinting bit to the Web. If both fw
and mu
are dropped, we get the one setting that’s consistently available across operating systems: h12 vs. h23, but while people do have preferences, people who read out-of-locale content tend to be able to read the non-preferred option.
The "Motivation" section claims that in locales with multiple numbering systems in use (in practice Western Arabic aka. latn
and script-native in either order of priority) the other numbering system would not be "immediately intelligible". This claim could use data/references to substantiate the severity level of the issue. (Users may have a preference, but to what extent does the issue rise to the level of not immediately intelligible? Previously, the primary example given has been opting into script-native Devanagari digits for Hindi, which by default uses latn
digits. Without first-hand experience, the “not immediately intelligible” level of seriousness looks odd in the light of licence plates on cars in India using latn
digits.)
In the light of CLDR data about primary calendars by region as well as calendar comprehension presumably correlating with language comprehension, the calendar system aspect could use some data/references to characterize the usability importance of sites (not calendar apps but sites displaying dates) dynamically adapting to a user preference.
That the numbering system should bind to an advertised language seems like the right conclusion.
Likewise for calendar systems (though bound to the region or likely-subtags-implied region of the language tag).
If we were to go forward with the general idea of exposing non-CLDR-default numbering system preference with the additional observation that it should go together with a language, and the assumption that consumers of existing things like Accept-Language might not deal with extensions, how can the problem of sites applying the numbering system preference to a non-primary language for which it doesn’t make sense be avoided? (That is, avoiding the failure mode suggests saying hi-u-nu-deva
somewhere instead of having hi
and nu-deva
at a distance from each other.)
There seem to be complications with script-sharing languages that plausibly appear in preference order having different numbering system defaults in CLDR. For example, per CLDR, Marathi not only defaults to Devanagari digits but does not even offer latn
digits as an alternative. If one specifies Marathi first, Hindi second as a language preference order, should it imply anything about digits for Hindi (considering that even existing preference UIs that allow for numbering system don’t seem to allow specifying them for languages other than the one highest on the user’s priority list)?
If we were to expose the numbering system preference, should there be a CSS property to transform digits according to the user preference (assuming appropriate surrounding context)? That is, allowing sites to say digit-transform: auto;
instead of using the Locale Extensions mechanism proposed? (This wouldn't mitigate fingerprinting, as the distinction could be measured from layout box metrics.)
If we were to accommodate the temperature unit use case in the Web platform, would it make more sense to do so via an HTML element that marks up the default (from site point of view) temperature so that the browser can convert the temperature rendering in place in layout if the user’s preference disagrees or, to avoid fingerprinting, if the user interacts with the amount to reveal a conversion (hover, click/tap, context menu, or similar)? (We already have the time element in HTML for datetimes.)
Request for Mozilla Position on an Emerging Web Specification
Other information
On the Web platform, content is localized dependent only upon a user's language or region. However, this behavior can result in annoyance, frustration, offense, or even uninteligibility for some users.
Some example situations:
In the native environment these problems do not occur, since users can specify these desired customizations in their system settings. However, the full amount of flexibility allowed for in the native environment is not possible in the potentially hostile web environment. This proposal defines a mechanism for making a limited subset of the Unicode Extensions for BCP 47 available for content negotiation, providing options that address some of the worst problems with incomplete localization while only exposing coarse-grained data about the users who take advantage of these improvements.
Read the complete Explainer Slide deck about Locale Extensions
Feedback
I welcome feedback in this thread, but encourage you to file bugs against the Explainer.