Closed wareid closed 2 years ago
The issue was discussed in a meeting on 2021-10-26
The issue was discussed in a meeting on 2021-10-28
The issue was discussed in a meeting on 2022-04-08
List of resolutions:
I don't believe this issue is resolved yet. Providing an additional User-Agent string adds substantially to fingerprintability at a time that we are trying to reduce User-Agent entropy. At a minimum, we should note the fingerprinting risk in the rs spec. Normatively, we should be precise about its severity, recommendations to minimize unnecessary entropy, and clarify whether it's necessary in addition to the existing User-Agent string. (The core spec notes the risk and suggests non-normatively to content authors not to use it for tracking purposes.)
Hello Nick,
recommendations to minimize unnecessary entropy
Unless I am mistaken, in security / cryptography parlance "low entropy" is synonym with increased predictability (order), while conversely "high entropy" indicates randomness. Isn't the latter a desirable quality with respect to minimizing fingerprintability / trackability?
"Entropy" was a bit of technical shorthand here. Researchers in this area refer to entropy as the level of variability of the characteristics about a user or device. To the extent that there is high variability, the characteristics will represent a more unique and stably identifiable fingerprint, which has privacy implications in enabling tracking without transparency or control.
This section of the Mitigating Browser Fingerprinting draft describes entropy and other characteristics of severity of fingerprintability: https://www.w3.org/TR/fingerprinting-guidance/#identifying-fingerprinting-surface-and-evaluating-severity
Thank you for the clarification, Nick.
I'm curious if publishers have found the epubReadingSystem object at all useful?
We've talked about deprecating it in the past.
If it's underimplemented in reading systems and underused by publishers, is there any great loss if we move on from it?
If it's underimplemented in reading systems and underused by publishers, is there any great loss if we move on from it?
Per my first (but not exhaustive) test results, it is implemented in the sense of the CR requirement. Both Apple Books and Thorium implements it afaik.
I cannot answer the underused aspect. Isn't there a more general question about it in browsers, though? What the epubReadingSystem
object adds is a minor addition to what browsers already reveal...
it is implemented in the sense of the CR requirement
Sure, I'm speculating on whether it's realistic to ever see it widely implemented when it's been required to support it for almost a decade already.
We could make it an optional support feature, too, which would at least make it more reasonable to warn about the potential privacy issues that come with it. As it is, requiring support while warning about its security implications sounds contradictory.
Sure, I'm speculating on whether it's realistic to ever see it widely implemented when it's been required to support it for almost a decade already.
We could make it an optional support feature, too, which would at least make it more reasonable to warn about the potential privacy issues that come with it. As it is, requiring support while warning about its security implications sounds contradictory.
I have no information on whether it is implemented in general or not. Note that it is already optional, in the sense that scripting support is optional in the first place (and the question on whether it is implemented means whether it is present if a RS allows for scripting).
Allowing scripting opens up the flood gates for many potential issues, including fingerprinting through the facilities of the relevant WebView, and epubReadingSystem
might just be a minor additional point in the overall picture. My feeling is that having an optional feature "within" an optional feature is slightly over the top...
(But this is not an issue I would lie down the road for...)
Note that it is already optional, in the sense that scripting support is optional in the first place
That's not really making it optional, though, since it's only relevant if scripting is supported. If you support scripting, you must support the object.
The question in making any change is whether we tell reading systems to abandon it, which is what deprecating would do since no rendering ever depends on this, or whether we leave it as a feature of the specification but don't require its implementation anymore.
It'd be interesting to do a survey of publishers and see if any use it. That would maybe help shed some light on whether it's a dusty corner of the spec or not.
Note that it is already optional, in the sense that scripting support is optional in the first place
That's not really making it optional, though, since it's only relevant if scripting is supported. If you support scripting, you must support the object.
Yes. What we should find out (that is why testing is done...) is how frequent is to have an RS that supports javascript for authors but not supporting epubReadingSystem
. We may find out that this number is actually very low, ie, RS-s that support javascript already support the object. If so, I do not think we should change the spec...
Per my first (but not exhaustive) test results, it is implemented in the sense of the CR requirement. Both Apple Books and Thorium implements it afaik.
Does the epubReadingSystem value in those cases duplicate or add to what's in the navigator.userAgent
?
There might be less entropy added if it's largely the same entropy as what's in the existing UA string. But in that case, it's not clear what the use is for content authors.
Per my first (but not exhaustive) test results, it is implemented in the sense of the CR requirement. Both Apple Books and Thorium implements it afaik.
Does the epubReadingSystem value in those cases duplicate or add to what's in the
navigator.userAgent
?There might be less entropy added if it's largely the same entropy as what's in the existing UA string. But in that case, it's not clear what the use is for content authors.
The specification explicitly says:
This specification extends the Navigator object [html] as follows.
I.e., if I understand your question properly, it adds an information.
@npdoty, on your separate comment: an RS is (usually) built on top of a webview system, e.g., a chromium core, and it does not implement a full browser. For those, I would expect that the navigator.userAgent
is something like chromium
(I am not an expert, so I may very well be wrong). Furthermore, the same webview might be shared among different reading systems. Hence the additional information.
@npdoty do you still believe we should do something about this issue? I am not sure where we are...
Is the question only whether we should enable targeting reading systems via their name/version given that it allows more specific profiling?
If the goal of the object is to allow content to adapt to the capabilities of a reading system, we should only need the feature detection part of the epubReadingSystem object.
FYI, just running a quick test on Thorium I get:
navigator.userAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) EDRLab.ThoriumReader/2.0.0 Chrome/102.0.5005.61 Electron/19.0.1 Safari/537.36
navigator.epubReadingSystem.name: Thorium
navigator.epubReadingSystem.version: 2.0.0
In this case, the name/version aren't telling you much more than you could parse out of the userAgent string. (I don't know if that holds for other implementations.)
Is the content of the userAgent value standardized? Is it expected that the Thorium and its version would appear in it?
@bduga @danielweck
On the other hand... what it also tells me that the EPUB extension does not add any more fingerprintability surface to what is already there, ie, this issue may be moot...
I think there is an RFC for UA strings. Also see https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent
But they are usually spoofable - that is, users can often change their UA strings. There is no requirement Thorium put that information in the UA string, so exposing the name and version COULD be additional information, but isn't necessarily. This is only an issue where scripting is allowed and scripted content is allowed access to the network, which is already pretty bad. That is, given such a RS, fingerprintability is the least of my concerns. That said, these values are terrible and I hope no one uses them, since they suffer from exactly the same issues that have made UA sniffing the nightmare it is. I would be happy to see it go away. What would break I have no idea.
The issue was discussed in a meeting on 2022-07-21
List of resolutions:
epubReadingSystem
object.Is the content of the userAgent value standardized? Is it expected that the Thorium and its version would appear in it?
In Thorium's case, the navigator.userAgent
string is populated automatically by Electron (which is the Chromium-based cross-platform application framework used by Thorium). Thorium does not use the navigator.userAgent
setter ( https://www.electronjs.org/docs/latest/api/web-contents#contentssetuseragentuseragent ).
As for epubReadingSystem.name|version
, Thorium will of course continue to inject the now-deprecated / legacy properties, for backward compatibility.
I think that resolution makes sense, thanks. Feature detection is generally more useful and more future proof than encouraging more UA string parsing, but having these additional potentially duplicative settings may have discouraged future progress.
From the PING review: