Privacy of hints - Githubissues

geonnave commented 5 months ago

During the IETF 119 LAKE meeting (see minutes here), some comments were raised about the privacy aspects of hints (identifiers of V passed from U to W and vice-versa). This issue is intended as a place for discussion of the raised points.

Currently in version -01 of the draft, there are two kinds of hints:

u_hints, meaning radio identifiers of Vs that are discovered by U and communicated to W via an encrypted channel (carried within EAD_1 and then VREQ).
v_hints, meaning radio identifiers of Vs that are pre-populated in W and are communicated to U via an encrypted channel (carried within VRES and then EDHOC error Access Denied).

Below I explore whether or not there are privacy issues regarding hints, from the perspectives of U, V and W, as well as when data is in transit.

1. Privacy of `u_hints`

At U: gateways (Vs) normally advertise their radio identifiers in beacons -> no issue.
At V: the hint is encrypted -> no issue.
At W: the radio identifiers of any V will be read by W
- ⚠️ potential issue: if W shares the identifiers with 3rd parties
- mitigation: W should commit to not share the received identifiers.
in transit between U <-> V and V <-> W: the hint is encrypted -> no issue.

2. Privacy of `v_hints`

At W: the radio identifiers of Vs that are securely associated with W will be read by W
- ⚠️ potential issue: if W shares the identifiers with 3rd parties
- mitigation: W should commit to not share the received identifiers.
- ❓note: since Vs are already associated with W, the Vs should already trust W to not share any information with 3rd parties. Thus I think this is a non-issue.
At V: the hint is encrypted -> no issue.
At U: the device U will gain information about Vs that allow U to enroll.
- ⚠️ potential issue: if the device is malicious, it would be learning new information about the system
- mitigation: only return hints when U is known to W.
- example 1: device u1 is listed in W's ACL, but the V through which it tried to join is not in the ACL (leading to error "Access Denied"). Since W knows u1, it MAY return v_hints.
- example 2: device u2 is not listed in W's ACL (leading to error "Access Denied"). Since W does not know u2, it MUST NOT return v_hints.
- ❓ note: is there an issue when U is known by W (as in example 1 above)? It would be learning about radio identifiers of Vs that might or might not be in its radio range and are expected to be authorized to enroll U. Personally I think this is a non-issue.
in transit between W <-> V and V <-> U: the hint is encrypted -> no issue.

Discussion

I think that the only issues that we need to address in the document are 1. Privacy of u_hints at W and 2. Privacy of v_hints at U (when U is unknown to W).

Looking forward to receiving comments from those interested in this topic.

sftcd commented 5 months ago

Hi Geovane,

On 02/04/2024 16:23, Geovane Fedrecheski wrote:

During the IETF 119 LAKE meeting (see minutes here), some comments were raised about the privacy aspects of hints (identifiers of V passed from U to W and vice-versa).

So one thing that'd help me a lot in understanding your analysis would be a concrete example that describes a possibly "bad" privacy situation. That'd not be text that'd need to be in the draft, but I think it'd help the discussion along, both here and maybe later on the list.

So... what's the worst privacy case we can think of here?

Ta, S.

geonnave commented 5 months ago

Great question.

1. Privacy of u_hints at W

Let's say that W receives MAC addresses of some Vs, and shares them with third parties. This is harmless in my opinion, as MAC addresses are freely broadcasted in beacons, and do not identify a customer.

Now, would W share ID_U along with the list of MAC addresses, then it might be used to determine the approximate location of U, e.g. using services like https://developers.google.com/maps/documentation/geolocation/overview.

Since U assumes W to be trusted (e.g. W is the manufacturer), neither of the above should happen.

2. Privacy of v_hints at U (when U is unknown to W)

Let's say that u1 is a malicious U that has no real business relationship with W. u1 somehow obtained G_W and LOC_W, but u1 is not known by W: it is not present in the ACL, nor in any database entry.

Now, u1 prepares and sends a well formed Voucher Request, which W receives and processes normally. Since u1 is not in W's ACL, W would issue an "Access Denied" error, possibly with hints (MAC addresses of Vs). However, since u1 is not in the ACL in the first place, it is impossible to construct a hint that is tailored for u1. So, if no hint is added, there is no issue.

A worst case than this could be W picking random MAC addresses of associated Vs as hints to send to u1 (although this does not make much sense in my opinion). In this case, u1 could keep re-sending Voucher Requests and learning more and more Vs. Eventually, u1 would learn all Vs that are linked to a W. This would expose a business relationship, which could be sensitive. Again, I cannot think of a scenario where picking random Vs to serve as hints could make sense.

mcr commented 5 months ago

Geovane Fedrecheski @.***> wrote:

2. Privacy of v_hints at U (when U is unknown to W) Let's say that

u1 is a malicious U that has no real business relationship with W. u1 somehow obtained G_W and LOC_W, but u1 is not known by W: it is not present in the ACL, nor in any database entry.

Now, u1 prepares and sends a well formed Voucher Request, which W receives and processes normally. Since u1 is not in W's ACL, W would issue an "Access Denied" error, possibly with hints (MAC addresses of Vs). However, since u1 is not in the ACL in the first place, it is impossible to construct a hint that is tailored for u1. So, if no hint is added, there is no issue.

I agree W should never provide hints in the case that u1 was not manufactured by W.

> A worst case than this could be W picking random MAC addresses of
> associated Vs as hints to send to u1 (although this does not make much
> sense in my opinion). In this case, u1 could keep re-sending Voucher
> Requests and learning more and more Vs. Eventually, u1 would learn all
> Vs that are linked to a W. This would expose a business relationship,
> which could be sensitive. Again, I cannot think of a scenario where
> picking random Vs to serve as hints could make sense.

Is this a u1 with a colluding W, who is now trying to learn the list of Vs? Such a W wouldn't be in V's trusted list of manufacturers, right?

geonnave commented 5 months ago

Is this a u1 with a colluding W, who is now trying to learn the list of Vs?

No, I was thinking of a W that simply has a naive implementation of the hint-picking algorithm.

Such a (colluding) W wouldn't be in V's trusted list of manufacturers, right?

Correct, therefore the collusion with an untrusted W should not be possible.

sftcd commented 5 months ago

Hiya,

On 03/04/2024 14:01, Geovane Fedrecheski wrote:

Great question.

1. Privacy of u_hints at W

Let's say that W receives MAC addresses of some Vs, and shares them with third parties. This is harmless in my opinion, as MAC addresses are freely broadcasted in beacons, and do not identify a customer.

I was more after a scenario describing some specific type of device - it may seem much less threatening to consider issues for "U" than to consider issues for e.g. a health monitor or some other device carried by a human.

In such cases, GPS co-ords or the list of MACs seen for V's leaks location information to W which seems undesirable in general. (Esp. as the protocol uses opaque fields for carrying that, and could therefore be abused by device manufs.)

Consider also security issues (as opposed to privacy issues) in cases where the device is carried by a person in some security-sensitive location (e.g. a military installation or prison) - it seems pretty squirmy to me to even provide location information to the device manuf in such cases.

I'd generally wonder if the risks are worth the possible efficiency benefits.

Cheers, S.

geonnave commented 5 months ago

Hi @sftcd,

At this point, considering we proceed with #29, we would be backing off from sending opaque data in lieu of sending only network identifiers. Also, as a result of the present discussion, I think we can entirely discard the idea of sending GPS coordinates.

Another consideration is that just by having U trying to enroll via V, this already leaks some location information to W. For example, in Bluetooth or IEEE 802.15.4 or Wi-Fi one can easily expect a geolocation accuracy of a few tens of meters, given the MAC address of just one gateway. Of course triangulation of more gateways can improve that, but I would say e.g. 50 meters is already quite sensitive.

Next, IIUC, your concerns regard only hints from U to W. Do you have concerns with the other way around? (where the manufacturer sends identifiers of gateways to a known device).

I am taking a look at what the performance improvements could look like, might come back later with data.

sftcd commented 5 months ago

TBH, I don't understand why MAC addresses are even involved here - doesn't W only need to know CRED_V? Re-using possibly long term static identifiers like MAC addresses in this way would seem like an anti-pattern to me.

And yes, my primary concern is leaking information about U (e.g. location) to W. But I've not analysed whether there are downsides to W "sharing" information with U - I guess there may be issues there that could enable a bad-actor U to more easily probe networks, but as I say, I've not analysed that. (I think mcr's comments above do though.)

geonnave commented 5 months ago

why MAC addresses are even involved here (...) doesn't W only need to know CRED_V?

The gist here is: we are considering an optimization to accelerate enrollment time when there are several potential networks (Vs), and to have it working would require some sharing of network identifiers.

Re-using possibly long term static identifiers like MAC addresses

The identifiers do not need to be long term, but then V would need to publish them at W periodically, which in turn requires additional coordination.

Continuing from my previous message, I did some simulations so we can discuss numbers.

We would be looking at faster join times of around 2x to 3x, when the number of available Vs is 10, with variations depending on the types of radio used. Specifically:

considering 10 available Vs and 5 hint items per message
- 3x faster join time, considering a 250 Kbps network with a scheduling timeslot of 10 ms (typical TSCH configuration)
- 2x faster join time, considering a 1 Mbps network with no scheduling cost (typical in Bluetooth)

Hints are more effective the more Vs there are available (e.g. may lead to a 10x improvement under certain conditions, provided there are > 25 available Vs). If available Vs = 5, there is no difference. If it is smaller than 5, using hints causes it to be slower.

So, it is a considerable improvement, although likely not a 10x factor in the average case.

At this point not sure about next steps. Would be interested in hearing what others think on this topic (@gselander @mcr @malishav @chrysn).

sftcd commented 5 months ago

On 04/04/2024 17:27, Geovane Fedrecheski wrote:

Re-using possibly long term static identifiers like MAC addresses

The identifiers do not need to be long term, but then V would need to publish them at W periodically, which in turn requires additional coordination.

If that's possible, then oughtn't it be possible to use some other short-lived identifier that isn't a MAC address?

At this point not sure about next steps.

I'd say summarise to the WG list is a good next step.

Cheers, S.

mcr commented 5 months ago

Geovane Fedrecheski @.***> wrote:

The gist here is: we are considering an optimization to accelerate enrollment time when there are several potential networks (Vs), and to

I think that we could add this later. Right now, I think it's premature optimization.

gselander commented 5 months ago

About U providing information about MAC addresses to W:

Here is an product using hashes of MAC addresses: https://www.footfallcam.com/people-counting/knowledge-base/chapter-19-general-data-protection-regulation-gdpr/

We could even do better. Instead of sending a list of MAC addresses, U sends a list of H(G_X, MAC-address) in u_hint (encrypted for W), where

H() is a truncated hash of size, say, 4 bytes (TBD)
G_X is working as a nonce, so the same MAC address detected by different Us or for different sessions with the same U would render independent values

The lists are matched by W against the corresponding values calculated with the MAC addresses of relevant Vs which are known to W by the setup of the system. Thus V learns which Vs are heard by U, and then deletes the list of hashes.

If U is a legitimate device etc. W returns information about preferred Vs' MAC addresses in v_hint (encrypted for U). (This could also be truncated hashes for compactness. Alternatively, U could enumerate the detected MAC addresses and W just sends back a list of integers, but that requires U to cache the MAC addresses ... which somehow works against the privacy ambition.)

Would this be sufficient to address this privacy aspect?

sftcd commented 5 months ago

Would this be sufficient to address this privacy aspect?

If W ends up knowing which set of V's are close to U, then isn't it game over from the privacy perspective regardless of how W learns that?

geonnave commented 5 months ago

If W ends up knowing which set of V's are close to U

In certain use cases, such as Example 2 in the draft, we consider that the authorization policies may contain mappings from U to V's:

the access policy in W specifies, via a mapping of shape (ID_U; MAC1, MAC2, ...) that device u1 
can only join via gateway v3, i.e., the mapping is: (14; 39-63-C9-D0-5C-62)

In this case, W already has the full graph connecting U's and V's -- it has to have it, to enforce some business logic.

It is worth emphasizing that the hints would only be used in this kind of scenario, where W already has a mapping of U and V's. Since hints are optional, other scenarios that have less coupled architectures (with U's being able to join via any V's), would still be supported by the protocol, and would not need or use hints.

(note: the example is using MAC addresses, but it could be any identifier of V, such as ID_CRED_V)

sftcd commented 5 months ago

On 08/04/2024 09:28, Geovane Fedrecheski wrote:

In certain use cases, such as Example 2 in the draft, we consider that the authorization policies may contain mappings from U to V's:
the access policy in W specifies, via a mapping of shape (ID_U; MAC1, MAC2, ...) that device u1
can only join via gateway v3, i.e., the mapping is: (14; 39-63-C9-D0-5C-62)
In this case, W already has the full graph connecting U's and V's -- it has to have it, to enforce some business logic.

For me, the above is too abstract to enable analysis of privacy issues, or business logic.

Cheers, S.

lake-wg / authz

Privacy of hints #27

1. Privacy of `u_hints`

2. Privacy of `v_hints`

Discussion

1. Privacy of u_hints at W

2. Privacy of v_hints at U (when U is unknown to W)

2. Privacy of v_hints at U (when U is unknown to W) Let's say that

1. Privacy of u_hints at W