sftcd / tek_transparency

Some measurements of deployments of apps using the Google/Apple Exposure Notification system
MIT License
9 stars 4 forks source link

Padding in PL files to 1000+ entries per file #13

Open tomekziel opened 3 years ago

tomekziel commented 3 years ago

For a few days now polish TEK files are artifically padded to 1000+ entries.

https://exp.safesafe.app//1599609600-1599652800-00001.zip - 1031 TEKs https://exp.safesafe.app//1599825600-1599868800-00001.zip - 1011 TEKs (do mind double slash in URL)

Discussed here https://github.com/ProteGO-Safe/specs/issues/232, production team member justifies it as "lowering the risk of deanonymisation" with no further clarification.

sftcd commented 3 years ago

On 16/09/2020 09:38, Tomasz Zieliński wrote:

For a few days now polish TEK files are artifically padded to 1000+ entries.

https://exp.safesafe.app//1599609600-1599652800-00001.zip - 1031 TEKs https://exp.safesafe.app//1599825600-1599868800-00001.zip - 1011 TEKs (do mind double slash in URL)

Discussed here https://github.com/ProteGO-Safe/specs/issues/232, production team member justifies it as "lowering the risk of deanonymisation" with no further clarification.

Thanks, was wondering about that. The Austrians also used to pad at a similar scale, but in a manner that allowed detecting the fakes. I guess we'll see over time if there's a way to count the fakes in this case.

Cheers, S.

qLb commented 3 years ago

@tomekziel i know the process can be hard to understand but we clarified that it's a design decision implemented by google not us for - as we believe - very good reasons. Interesting parts of logic can be found: here, here and there.

sftcd commented 3 years ago

On 27/10/2020 15:11, qLb wrote:

@tomekziel i know the process can be hard to understand but we clarified that it's a design decision implemented by google not us for - as we believe - very good reasons.

I've never seen those "reasons" myself. My starting point is that they are bogus tbh.

Interesting parts of logic can be found: here, here and there.

I clicked those links and found no reasons stated at all.

S.

tomekziel commented 3 years ago

i know the process can be hard to understand but we clarified that it's a design decision implemented by google not us for - as we believe - very good reasons

As no one is able to state any reasonable justification and all you have is a belief, how about changing EXPORT_FILE_MIN_RECORDS to 1 and EXPORT_FILE_PADDING_RANGE to 1? You will still work with current codebase, just adjust parameters and everyone will be happy.

@sftcd true numbers for Poland were published here: https://github.com/ProteGO-Safe/specs/issues/241#issuecomment-717137874

Column 2 is upload session number, column 3 is number of TEK in those sessions (with no info about days affected).

qLb commented 3 years ago

In the case of a widespread in a smaller communities you coluld then determine the "origin" of the TEKs just by observating incrementation on a public endpoint - which could be dangerous to those communities for many different reasons. It's not about those reasons, is it? @tomekziel @sftcd

pvieito commented 3 years ago

As no one is able to state any reasonable justification and all you have is a belief, how about changing

The reasoning behind padding is discussed at great length (😂) on this issue: https://github.com/eu-federation-gateway-service/efgs-federation-gateway/issues/209

In summary, this is done to enhance privacy in a very edge case: a chain of TEKs uploaded alone in a batch to the server so an attacker can suppose it is from the same device and chain all RPIs from the chained TEKs as from the same device. The supposition is a bit more difficult to accept as valid if the chain is uploaded with other padding TEKs.

sftcd commented 3 years ago

On 27/10/2020 17:02, Pedro José Pereira Vieito wrote:

As no one is able to state any reasonable justification and all you have is a belief, how about changing

The reasoning behind padding is discussed at great length (😂) on this issue: https://github.com/eu-federation-gateway-service/efgs-federation-gateway/issues/209

In summary, this is done to enhance privacy in a very edge case: a chain of TEKs uploaded alone in a batch to the server so an attacker can suppose it is from the same device and chain all RPIs from all the chain TEKs as from the same device.

Yep, that's bogus all right. If a service is serving only TEKs from one real user, then that service is useless and there is no point in either the real, nor the fake, TEKs.

The better approach in that situation would be to not bother serve any TEKs at all and just let the manual tracers handle that one person who uploaded. The same applies with any tiny number of uploads.

S.