w3c / ambient-light

Ambient Light Sensor
https://www.w3.org/TR/ambient-light/
Other
26 stars 21 forks source link

editorial: Add reading quantization and threshold check algorithms. #77

Closed rakuco closed 2 years ago

rakuco commented 2 years ago

Related to #63, which says the granularity of the data exposed by Ambient Light Sensors should be specified normatively.

This commit goes a bit further and specifies the two anti-fingerprinting measures currently implemented by Chrome -- namely, not only are illuminance values rounded but there's also a threshold value check to avoid storing values that are too close to the latest reading.

w3c/sensors#429 defines the concepts of "reading quantization algorithm" and "threshold check algorithm" that concrete sensors can specify. We specify both here, along with some values used by them (based on the current Chromium values):

These values are then used in the following algorithms:


Preview | Diff

rakuco commented 2 years ago

@anssiko @reillyeon @sandandsnow this is a companion to w3c/sensors#429

As mentioned there, AFAICS only specifying the granularity of the illuminance data is not enough, as the Chrome implementation also checks if the new reading differs from the latest one significantly enough, and IIRC @reillyeon mentioned only doing the rounding was not enough to avoid fingerprinting.

Opens I can see immediately:

anssiko commented 2 years ago

My initial impression is this looks good!

It is not clear if "at least 50lx" makes sense or if we should mandate 50lx specifically in the values above.

AFAICT, the "at least 50lx" threshold was informed by research conducted in this group with results collected using a setup described at https://github.com/w3c/ambient-light/issues/13#issuecomment-302393458.

Optimally we'd link to this data in the spec so that privacy researchers can review the test setup and data easily and we can adjust this mitigation if new information is brought to our attention. Revise as needed.

Rather than linking to a Google sheet, I'd prefer to see this data exported into an appendix in the spec, or alternatively convert the sheet into a markdown file stored in this repo.

Again, I'll lean on @sandandsnow and other PING participants for privacy experts' perspective.

anssiko commented 2 years ago

@sandandsnow, how could the DAS WG help PING review this proposed privacy mitigation?

This mitigation has already been implemented in Chromium.

As proposed in this PR, the DAS WG would like to now normatively specify this mitigation so that other implementers could benefit from this and we are seeking PING review to capture your perspective.

sandandsnow commented 2 years ago

Thank you for bringing this to my attention, and thank you for addressing this vulnerability. I do not have the specialist expertise to determine if at least 50lx threshold is a sufficient mitigation, but I will confer with others for their views and revert shortly.

anssiko commented 2 years ago

@sandandsnow, thanks for your swift response. I put a reminder to check back the status of this RFC in a week. Please let us know if PING has a meeting cadence we should align with.

We want to engage with PING as early as possible when there's a privacy-impacting concrete spec change proposal in review. Optimally, such proposals are not landed in the spec before PING has reviewed the proposed changes to minimise spec churn and to increase implementers' confidence.

lknik commented 2 years ago

Hello,

Thanks for not limiting to the frequency reduction which was not the central culprit of some past risks. I'm happy this gets formalised and I agree that this minimises the risks of such known attacks. Minimises, as it isn't clear if we're aware of the full risk potential. That said, this change helps, and likely fixes the most "reasonable" scenarios imaginable.

I agree that "50 lx" is quite a strong limitation, unless for really specific circumstances (can't be ruled out but probably atypical anyway). Another approach could involve further reduction and possibly going from quantitative lux readout to qualitative description such as "bright", "dark", "very dark", etc.

anssiko commented 2 years ago

@lknik, thanks for your review. Also thanks for the earlier contributions as a WG participant that also helped improve the privacy properties of this API. You’re in acknowledgements.

@sandandsnow, should we consider this to be PING’s official review or are we expecting more feedback?

lknik commented 2 years ago

Always happy to help, @anssiko!

Feel free to name the threshold check algorithm "Janc's algorithm" (of @arturjanc) :-) (j/k)

sandandsnow commented 2 years ago

@lknik thanks for weighing in @anssiko I am happy to defer to @lknik's expertise in this area. I don't imagine there will be further feedback, but I'll put this on the agenda for the next PING call on 17 February 2022 in case anyone wants to raise any further (and final) input. Thanks for your patience.

sandandsnow commented 2 years ago

@anssiko, we discussed this at our PING meeting on Thursday. As a consequence, there are a couple of follow-up questions. I was hoping to share them with you last week, but I'm waiting on colleagues to clarify those.

sandandsnow commented 2 years ago

Thank you. We discussed the proposed mitigations in the PING call today. As a result of that conversation we have a couple of follow-up questions:

And, a more general privacy question (not related to reducing granularity), how does the specification prevent or protect against cross-device tracking (e.g. the light equivalent of ultrasonic beacons)?

More specifically, we have received these observations and comments:

lknik commented 2 years ago

@sandandsnow May I please ask why do you consider the 50lx thing through the lens of fingerprinting risks? The point was to minimise data leak risks (here, which also sums up your observations, too -- in other words, we know about this :)). The "more general question" about cross-device... I'd say it greatly lowers the risk, but it nonetheless remains (out of bands).

rakuco commented 2 years ago

(apologies in advance for the wall of text ahead)

Hi, @sandandsnow and @lknik.

Thank you very much for all the time spent reviewing this PR (and special thanks to @lknik for being around and watching this API for years now). My apologies for the time it took me to get back to this change. At least I did spend some time working and documenting the Generic Sensor implementation in Chromium and have a better understanding of the mitigations I am trying to "upstream" here.

I've updated this PR as well as w3c/sensors#429 to address some of the feedback received here as well as to make the prose and algorithms better match what we have in Chromium. I strongly suggest looking at w3c/sensors#429 first and then reading this PR's diff. I'll go over the current solution and then try to address the concerns @sandandsnow has brought from PING.

Current change

Compared to the previous version from the end of 2021:

Things I'd like to discuss

PING's concerns

Thank you. We discussed the proposed mitigations in the PING call today. As a result of that conversation we have a couple of follow-up questions:

  • Could you clarify for us why the WG chose a 50lx threshold?

Done in the spec and also above, hopefully.

[other questions]

Please correct me if I'm wrong, but I'm under the impression that some of those concerns came up by looking at this spec in isolation without looking at the main Generic Sensor spec. https://w3c.github.io/sensors/#concepts-can-expose-sensor-readings and https://w3c.github.io/sensors/#abstract-operations mandate, for example, that:

It is up to each UA to implement "request permission to use", and it might involve prompting users, for example. At the moment, Chromium does not prompt users for access to motion sensors (e.g. accelerometer and gyroscope) but lets them allow or block access by default. We are also working on making this better by moving to prompting by default (and removing the "allow by default" option) as part of the working on implementing Device Orientation's requestPermission() method. Additionally, in https://www.w3.org/2021/10/29-dap-minutes.html#t07 we also decided to also add a camera permission requirement to the spec to make the permission requirements stricter (I still have to address that one).

With the above in mind, let me try to get to the specific questions:

  • Also, notwithstanding the mitigations, is there still a fingerprinting risk (albeit a reduced risk)? More specifically, to what extent does reducing to a 50lx threshold (or any threshold) prevent fingerprinting on the basis of opening up the capacity to track a user through typical behavior patterns?

I believe the fingerprinting risk remains. Even though we reduce the granularity of the data exposed to API users, an attacker could still know that a user is e.g. at an office environment between certain hours (320-500lx per https://en.wikipedia.org/wiki/Lux#Illuminance), and walks under full daylight at a certain time of the day (1000 to 10000lx). The mitigations listed above help prevent that websites (including third-parties) have undetected and unprompted access to the data.

And, a more general privacy question (not related to reducing granularity), how does the specification prevent or protect against cross-device tracking (e.g. the light equivalent of ultrasonic beacons)?

The idea with the set of mitigations proposed here and in the Generic Sensor spec is to make the readings coarse enough to help prevent cross-device tracking while at the same time only making readings available to pages that fulfill the requirements above and which the user has authorized to gather data.

More specifically, we have received these observations and comments:

  • Even bucketing by 50lux still seems to expose a lot of fingerprinting surface (>=4bits given the range here), which doesn’t seem acceptable

Do the mitigations above help make it more acceptable? I'm asking because this is also the case even for specs such as https://w3c.github.io/deviceorientation that are implemented by multiple engines: a DeviceMotionEvent can include the output of two 3-axis accelerometers, a gyroscope and a double (interval), and isn't that a lot more bits? I'm not asking this rhetorically, as I'm basing my calculations on https://www.eff.org/deeplinks/2010/01/primer-information-theory-and-privacy and am not sure if this is right.

  • Bucketing doesn’t seem to address the “ephemeral fingerprinting” concern

Are you referring to https://github.com/asankah/ephemeral-fingerprinting or is there another resource I could look at? That page lists several possible mitigations and we implement many of them, so I'm wondering if the Generic Sensor + ALS mitigations do address the concern at least partially?

  • This API seems like an extremely infrequently needed feature (as evidenced by most browsers not being interested in implementing); so, why not put it behind a permission prompt?
  • This seems to be easily exploitable as a covert channel (write to the channel by changing the brightness of the content on the page, read from the channel through the brightness sensor). The spec needs to address this (e.g. through permission prompt)

Answered above: the permission side of things is handled in the main Generic Sensor spec, UAs are free to handle the permission request implementation, we want to add a prompt to the Chromium implementation. Additionally, when it comes to the covert channel attack, the bucketing idea also helps make it more difficult -- the idea looks similar to https://arturjanc.com/ls/ after all, which the bucketing idea is supposed to help address.

rakuco commented 2 years ago

Approved with a fix for a reference to an unknown definition.

@reillyeon while I have you here, could you take a look at the "threshold check algorithm" idea part of https://github.com/w3c/ambient-light/pull/77#issuecomment-1145912333? I'd like to double-check those items with you since you were around when this was discussed when reviewing the initial version of these mitigations in Chromium.

reillyeon commented 2 years ago

@reillyeon while I have you here, could you take a look at the "threshold check algorithm" idea part of #77 (comment)? I'd like to double-check those items with you since you were around when this was discussed when reviewing the initial version of these mitigations in Chromium.

It may be possible to defer the threshold checking to the hardware or operating system as long as the threshold is implemented as a delta from the previously reported value. It is important that this works correctly as it is critical to making rounding an effective mitigation when rounding to a value significantly higher than the noise in the system, as is the case with the ambient light sensor.

rakuco commented 2 years ago

I've pushed a new version of this PR with a few changes:

rakuco commented 2 years ago

@sandandsnow @lknik friendly ping, just wondering if any of you had time to take a look at the changes pushed to this PR as well as w3c/sensors#429

lknik commented 2 years ago

In my opinion, the threshold method helps mitigating the risk. Of course, some potential remains but it would be much more difficult to abuse in practice.

The reason Fig 3/5 in the referenced PDFs vary so much may be due to the tested environment. In my tests, 50lx differences were also recorded routinely. However, in my view it is less likely (if mitigations are deployed) to abuse it in practice to e.g. exfiltrate data, as then the environmental changes would contribute less to a reliable abuse.

So let's move forward. There's still a risk that some academic team will want to validate the boundary issues, but such is life :)

rakuco commented 2 years ago

Thanks, @lknik, I really appreciate the review and that you've stuck around for so many years.

For the record, you might also be interested in #79 where we discuss the permissions/permission prompt situation (and requiring the camera permission for this API), which is also part of the analysis you've written about ALS.

anssiko commented 2 years ago

Thanks @lknik for your privacy-focused suggestions and review throughout the years (plural).

@sandandsnow we'd still be happy to hear PING's feedback for this proposed editorial improvement before we merge this PR.

sandandsnow commented 2 years ago

I'm happy to be guided by @lknik, but I have drawn this to the attention of my PING co-chairs in case they have anything further they wish to raise.

anssiko commented 2 years ago

@sandandsnow, it seems no concerns from the other PING co-chairs have been raised, could we merge this? If so, we’d appreciate if you could your submit approval with the usual GH facility (Files changes > Review changes).

sandandsnow commented 2 years ago

@anssiko We're happy for you to close the issue.

anssiko commented 2 years ago

Thanks @sandandsnow!