editorial: Add reading quantization and threshold check algorithms.

rakuco commented 2 years ago

Related to #63, which says the granularity of the data exposed by Ambient Light Sensors should be specified normatively.

This commit goes a bit further and specifies the two anti-fingerprinting measures currently implemented by Chrome -- namely, not only are illuminance values rounded but there's also a threshold value check to avoid storing values that are too close to the latest reading.

w3c/sensors#429 defines the concepts of "reading quantization algorithm" and "threshold check algorithm" that concrete sensors can specify. We specify both here, along with some values used by them (based on the current Chromium values):

An "illuminance rounding multiple" of at least 50lx.
An "illuminance threshold value" of at least 25lx (half the illuminance roundig multiple, to be more precise).

These values are then used in the following algorithms:

The "threshold check algorithm" checks that the difference between new and current illuminance values is above the illuminance threshold value.
The "reading quantization algorithm" rounds up readings to the closest multiple of the illuminance rounding multiple.

Preview | Diff

rakuco commented 2 years ago

@anssiko @reillyeon @sandandsnow this is a companion to w3c/sensors#429

As mentioned there, AFAICS only specifying the granularity of the illuminance data is not enough, as the Chrome implementation also checks if the new reading differs from the latest one significantly enough, and IIRC @reillyeon mentioned only doing the rounding was not enough to avoid fingerprinting.

Opens I can see immediately:

It is not clear if "at least 50lx" makes sense or if we should mandate 50lx specifically in the values above.
It is also not clear if implementations should have leeway to specify different values for the "illuminance threshold value" and the "illuminance rounding multiple" (or if the values should depend on each other).
The changes to the illuminance getter are probably too hand-wavy. I did not know whether the actual algorithm (this is what Chrome does at the moment) must be specified or not.
Chrome does the rounding before notifying Sensor instances (e.g. the getter just calls "get value from latest readings") by storing raw readings but sharing rounded readings with instances, but I guess this is irrelevant from a spec perspective?
The names for the new concepts and values are not great :-)

anssiko commented 2 years ago

My initial impression is this looks good!

It is not clear if "at least 50lx" makes sense or if we should mandate 50lx specifically in the values above.

AFAICT, the "at least 50lx" threshold was informed by research conducted in this group with results collected using a setup described at https://github.com/w3c/ambient-light/issues/13#issuecomment-302393458.

Optimally we'd link to this data in the spec so that privacy researchers can review the test setup and data easily and we can adjust this mitigation if new information is brought to our attention. Revise as needed.

Rather than linking to a Google sheet, I'd prefer to see this data exported into an appendix in the spec, or alternatively convert the sheet into a markdown file stored in this repo.

Again, I'll lean on @sandandsnow and other PING participants for privacy experts' perspective.

anssiko commented 2 years ago

@sandandsnow, how could the DAS WG help PING review this proposed privacy mitigation?

This mitigation has already been implemented in Chromium.

As proposed in this PR, the DAS WG would like to now normatively specify this mitigation so that other implementers could benefit from this and we are seeking PING review to capture your perspective.

sandandsnow commented 2 years ago

Thank you for bringing this to my attention, and thank you for addressing this vulnerability. I do not have the specialist expertise to determine if at least 50lx threshold is a sufficient mitigation, but I will confer with others for their views and revert shortly.

anssiko commented 2 years ago

@sandandsnow, thanks for your swift response. I put a reminder to check back the status of this RFC in a week. Please let us know if PING has a meeting cadence we should align with.

We want to engage with PING as early as possible when there's a privacy-impacting concrete spec change proposal in review. Optimally, such proposals are not landed in the spec before PING has reviewed the proposed changes to minimise spec churn and to increase implementers' confidence.

lknik commented 2 years ago

Hello,

Thanks for not limiting to the frequency reduction which was not the central culprit of some past risks. I'm happy this gets formalised and I agree that this minimises the risks of such known attacks. Minimises, as it isn't clear if we're aware of the full risk potential. That said, this change helps, and likely fixes the most "reasonable" scenarios imaginable.

I agree that "50 lx" is quite a strong limitation, unless for really specific circumstances (can't be ruled out but probably atypical anyway). Another approach could involve further reduction and possibly going from quantitative lux readout to qualitative description such as "bright", "dark", "very dark", etc.

anssiko commented 2 years ago

@lknik, thanks for your review. Also thanks for the earlier contributions as a WG participant that also helped improve the privacy properties of this API. You’re in acknowledgements.

@sandandsnow, should we consider this to be PING’s official review or are we expecting more feedback?

lknik commented 2 years ago

Always happy to help, @anssiko!

Feel free to name the threshold check algorithm "Janc's algorithm" (of @arturjanc) :-) (j/k)

sandandsnow commented 2 years ago

@lknik thanks for weighing in @anssiko I am happy to defer to @lknik's expertise in this area. I don't imagine there will be further feedback, but I'll put this on the agenda for the next PING call on 17 February 2022 in case anyone wants to raise any further (and final) input. Thanks for your patience.

sandandsnow commented 2 years ago

@anssiko, we discussed this at our PING meeting on Thursday. As a consequence, there are a couple of follow-up questions. I was hoping to share them with you last week, but I'm waiting on colleagues to clarify those.

sandandsnow commented 2 years ago

Thank you. We discussed the proposed mitigations in the PING call today. As a result of that conversation we have a couple of follow-up questions:

Could you clarify for us why the WG chose a 50lx threshold?
Also, notwithstanding the mitigations, is there still a fingerprinting risk (albeit a reduced risk)? More specifically, to what extent does reducing to a 50lx threshold (or any threshold) prevent fingerprinting on the basis of opening up the capacity to track a user through typical behavior patterns?

And, a more general privacy question (not related to reducing granularity), how does the specification prevent or protect against cross-device tracking (e.g. the light equivalent of ultrasonic beacons)?

More specifically, we have received these observations and comments:

Even bucketing by 50lux still seems to expose a lot of fingerprinting surface (>=4bits given the range here), which doesn’t seem acceptable
Bucketing doesn’t seem to address the “ephemeral fingerprinting” concern
This API seems like an extremely infrequently needed feature (as evidenced by most browsers not being interested in implementing); so, why not put it behind a permission prompt?
This seems to be easily exploitable as a covert channel (write to the channel by changing the brightness of the content on the page, read from the channel through the brightness sensor). The spec needs to address this (e.g. through permission prompt)

lknik commented 2 years ago

@sandandsnow May I please ask why do you consider the 50lx thing through the lens of fingerprinting risks? The point was to minimise data leak risks (here, which also sums up your observations, too -- in other words, we know about this :)). The "more general question" about cross-device... I'd say it greatly lowers the risk, but it nonetheless remains (out of bands).

rakuco commented 2 years ago

(apologies in advance for the wall of text ahead)

Hi, @sandandsnow and @lknik.

Thank you very much for all the time spent reviewing this PR (and special thanks to @lknik for being around and watching this API for years now). My apologies for the time it took me to get back to this change. At least I did spend some time working and documenting the Generic Sensor implementation in Chromium and have a better understanding of the mitigations I am trying to "upstream" here.

I've updated this PR as well as w3c/sensors#429 to address some of the feedback received here as well as to make the prose and algorithms better match what we have in Chromium. I strongly suggest looking at w3c/sensors#429 first and then reading this PR's diff. I'll go over the current solution and then try to address the concerns @sandandsnow has brought from PING.

Current change

Compared to the previous version from the end of 2021:

The illuminance getter here no longer does anything special. Instead, I've moved the (optional) rounding of values to the newly-added "reading quantization algorithm" called by "get value from latest reading" in the main Generic Sensor spec. Specifications such as this one then define a "reading quantization algorithm" to perform the actual rounding, but the mechanism is generic enough that other specs (such as Accelerometer) could do something similar without having to do everything from scratch.
The value of the "illuminance threshold value" has changed from 50lx to "half of the illuminance rounding multiple" (25lx by default then). This matches the current Chromium implementation, although I am not sure this is the best option (more on that below).
I've added informative references to works analyzing potential attacks using Ambient Light Sensors, including one of @lknik's blog posts, and how the proposed changed help mitigate some of the attacks. I've also mentioned issue #13 where the current proposed rounding value of 50lx was first derived, and I've linked to the spreadsheet with the measurements made back in 2017 that we've been using as a basis for the decisions (in the future we could follow @anssiko's idea and add the table to this repository).
I've also followed https://w3c.github.io/fingerprinting-guidance/#mark-fingerprinting and added a tracking-vector mark to the security and privacy section (not sure if it should be added to a different location instead though).

Things I'd like to discuss

[ ] I am not entirely sure about https://arxiv.org/abs/1405.3760 because the variation in lux in figures 3 and 5 differs quite a lot. The variation in figure 5 matches my informal tests using my own phone more closely, and the mitigations suggested here would address them more effectively than the variation in figure 3.
[ ] The "threshold check algorithm" idea
- [ ] I am still not 100% sure the whole idea behind the it counts as a fingerprinting/data leak mitigation or if it's just a sort of high-pass filter for changes in illuminance. Some operating systems and hardware platforms even take care of this kind of check themselves, where new readings are made available only if they differ from the latest one by some percentage or absolute value. In Chromium, the idea was to avoid reporting too many changes when illuminance values hovered around the "edges" between rounding multiples, so that moving from 24lx to 26lx and back to 24lx wouldn't generate 3 "reading" events and update the readings, for example, and let an attacker know that the actual value is probably in that region. I can see how this helps avoid leaking data, but is it effective enough to count as a mitigation?
- [ ] The actual "illuminance threshold value" suggested here (25lx) came up during code review years ago, but I don't think there's a strong motivation for it to be 25lx and not the same as the "illuminance rounding multiple" (50lx). Using 25lx means the API might end up reporting a new reading even if the rounded value has not changed. For example, going from 26lx to 53lx in raw readings would generate a "reading" event even though the illuminance attribute would return 50 in both cases. Cases like this could allow an attacker to know that the actual illuminance value is in the [25,75) half-open interval and that the two readings differ by at least 25lx and at most 49lx. If we the "illuminance rounding multiple", this case would not exist.

PING's concerns

Thank you. We discussed the proposed mitigations in the PING call today. As a result of that conversation we have a couple of follow-up questions:

Could you clarify for us why the WG chose a 50lx threshold?

Done in the spec and also above, hopefully.

[other questions]

Please correct me if I'm wrong, but I'm under the impression that some of those concerns came up by looking at this spec in isolation without looking at the main Generic Sensor spec. https://w3c.github.io/sensors/#concepts-can-expose-sensor-readings and https://w3c.github.io/sensors/#abstract-operations mandate, for example, that:

Sensor readings are only exposed under the following conditions:
- The environment is a secure context.
- Sensor usage is allowed via the Permissions Policy API.
- The document's visibility is "visible".
- The currently focused area belongs to a document whose origin is same origin-domain with document’s origin.
- The call(s) to request permission to use in Sensor.start() have passed.

It is up to each UA to implement "request permission to use", and it might involve prompting users, for example. At the moment, Chromium does not prompt users for access to motion sensors (e.g. accelerometer and gyroscope) but lets them allow or block access by default. We are also working on making this better by moving to prompting by default (and removing the "allow by default" option) as part of the working on implementing Device Orientation's requestPermission() method. Additionally, in https://www.w3.org/2021/10/29-dap-minutes.html#t07 we also decided to also add a camera permission requirement to the spec to make the permission requirements stricter (I still have to address that one).

With the above in mind, let me try to get to the specific questions:

Also, notwithstanding the mitigations, is there still a fingerprinting risk (albeit a reduced risk)? More specifically, to what extent does reducing to a 50lx threshold (or any threshold) prevent fingerprinting on the basis of opening up the capacity to track a user through typical behavior patterns?

I believe the fingerprinting risk remains. Even though we reduce the granularity of the data exposed to API users, an attacker could still know that a user is e.g. at an office environment between certain hours (320-500lx per https://en.wikipedia.org/wiki/Lux#Illuminance), and walks under full daylight at a certain time of the day (1000 to 10000lx). The mitigations listed above help prevent that websites (including third-parties) have undetected and unprompted access to the data.

And, a more general privacy question (not related to reducing granularity), how does the specification prevent or protect against cross-device tracking (e.g. the light equivalent of ultrasonic beacons)?

The idea with the set of mitigations proposed here and in the Generic Sensor spec is to make the readings coarse enough to help prevent cross-device tracking while at the same time only making readings available to pages that fulfill the requirements above and which the user has authorized to gather data.

More specifically, we have received these observations and comments:

Even bucketing by 50lux still seems to expose a lot of fingerprinting surface (>=4bits given the range here), which doesn’t seem acceptable

Do the mitigations above help make it more acceptable? I'm asking because this is also the case even for specs such as https://w3c.github.io/deviceorientation that are implemented by multiple engines: a DeviceMotionEvent can include the output of two 3-axis accelerometers, a gyroscope and a double (interval), and isn't that a lot more bits? I'm not asking this rhetorically, as I'm basing my calculations on https://www.eff.org/deeplinks/2010/01/primer-information-theory-and-privacy and am not sure if this is right.

Bucketing doesn’t seem to address the “ephemeral fingerprinting” concern

Are you referring to https://github.com/asankah/ephemeral-fingerprinting or is there another resource I could look at? That page lists several possible mitigations and we implement many of them, so I'm wondering if the Generic Sensor + ALS mitigations do address the concern at least partially?

This API seems like an extremely infrequently needed feature (as evidenced by most browsers not being interested in implementing); so, why not put it behind a permission prompt?

This seems to be easily exploitable as a covert channel (write to the channel by changing the brightness of the content on the page, read from the channel through the brightness sensor). The spec needs to address this (e.g. through permission prompt)

Answered above: the permission side of things is handled in the main Generic Sensor spec, UAs are free to handle the permission request implementation, we want to add a prompt to the Chromium implementation. Additionally, when it comes to the covert channel attack, the bucketing idea also helps make it more difficult -- the idea looks similar to https://arturjanc.com/ls/ after all, which the bucketing idea is supposed to help address.

rakuco commented 2 years ago

Approved with a fix for a reference to an unknown definition.

@reillyeon while I have you here, could you take a look at the "threshold check algorithm" idea part of https://github.com/w3c/ambient-light/pull/77#issuecomment-1145912333? I'd like to double-check those items with you since you were around when this was discussed when reviewing the initial version of these mitigations in Chromium.

reillyeon commented 2 years ago

@reillyeon while I have you here, could you take a look at the "threshold check algorithm" idea part of #77 (comment)? I'd like to double-check those items with you since you were around when this was discussed when reviewing the initial version of these mitigations in Chromium.

It may be possible to defer the threshold checking to the hardware or operating system as long as the threshold is implemented as a delta from the previously reported value. It is important that this works correctly as it is critical to making rounding an effective mitigation when rounding to a value significantly higher than the noise in the system, as is the case with the ambient light sensor.

rakuco commented 2 years ago

I've pushed a new version of this PR with a few changes:

It refers to https://chromium-review.googlesource.com/c/chromium/src/+/3666917 and https://bugs.chromium.org/p/chromium/issues/detail?id=1332536 and how the use of the threshold checks is based on implementation experience although we haven't done any mathematical analysis of its efficacy as a mitigation strategy auxiliary to rounding.
It makes the "threshold check algorithm" a bit more complicated by also checking that the rounded values differ, as discussed in https://bugs.chromium.org/p/chromium/issues/detail?id=1332536#c2.

rakuco commented 2 years ago

@sandandsnow @lknik friendly ping, just wondering if any of you had time to take a look at the changes pushed to this PR as well as w3c/sensors#429

lknik commented 2 years ago

In my opinion, the threshold method helps mitigating the risk. Of course, some potential remains but it would be much more difficult to abuse in practice.

The reason Fig 3/5 in the referenced PDFs vary so much may be due to the tested environment. In my tests, 50lx differences were also recorded routinely. However, in my view it is less likely (if mitigations are deployed) to abuse it in practice to e.g. exfiltrate data, as then the environmental changes would contribute less to a reliable abuse.

So let's move forward. There's still a risk that some academic team will want to validate the boundary issues, but such is life :)

rakuco commented 2 years ago

Thanks, @lknik, I really appreciate the review and that you've stuck around for so many years.

For the record, you might also be interested in #79 where we discuss the permissions/permission prompt situation (and requiring the camera permission for this API), which is also part of the analysis you've written about ALS.

anssiko commented 2 years ago

Thanks @lknik for your privacy-focused suggestions and review throughout the years (plural).

@sandandsnow we'd still be happy to hear PING's feedback for this proposed editorial improvement before we merge this PR.

sandandsnow commented 2 years ago

I'm happy to be guided by @lknik, but I have drawn this to the attention of my PING co-chairs in case they have anything further they wish to raise.

anssiko commented 2 years ago

@sandandsnow, it seems no concerns from the other PING co-chairs have been raised, could we merge this? If so, we’d appreciate if you could your submit approval with the usual GH facility (Files changes > Review changes).

sandandsnow commented 2 years ago

@anssiko We're happy for you to close the issue.

anssiko commented 2 years ago

Thanks @sandandsnow!

w3c / ambient-light