google / exposure-notifications-server

Exposure Notification Reference Server | Covid-19 Exposure Notifications
https://www.google.com/covid19/exposurenotifications/
Apache License 2.0
2.46k stars 313 forks source link

Broadcasting a list of infected persons is not GDPR conform #367

Closed Covid19Fighter closed 4 years ago

Covid19Fighter commented 4 years ago

Describe the bug

The whole concept of this server is to store a list of infected COVID19 persons and send it to everyone. I know that the list is supposed to be anonym, but locally you can match the IDs to a person. Even a beginner software developer will be able to modify the mobile app (Open Source and API are a good thing, but they are transparent and you can modify them) and store GPS and timestamp with the key and you don't have to upload the new app to the Store, so Google will not be able to check this. When the list of infected COVID19 patients (or their Ids or Keys even if you encrypt everything 100 times) is sent from the server the new app will be able to find out where and when it detected the infected person. If such a modified app is then distributed you can even create a whole database of keys.

This huge data privacy leak was already mentioned on the DP3T paper: https://github.com/DP-3T/documents/blob/master/DP3T%20White%20Paper.pdf

"Infected individuals. The centralised and decentralised contact tracing systems share the inherent privacy limitation that they can be exploited by an eavesdropper to learn whether an individual user got infected and by a tech-savvy user to reveal which individuals in their contact list might be infected now. However, the centralised design does not allow proactive and retroactive linkage attacks by tech-savvy users to learn which contacts are infected because the server never reveals the EphID s of infected users."

The so called "retroactive linkage attacks by tech-savvy users" is a huge problem!

Furthemore the whole thing is not GDPR conform and is not conform to the EU recommendations: https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32020H0518&from=EN

(16) | With particular regard the use of COVID-19 mobile warning and prevention applications, the following principles should be observed: (1) safeguards ensuring respect for fundamental rights and prevention of stigmatization, in particular applicable rules governing protection of personal data and confidentiality of communications; (4) effective cybersecurity requirements to protect the availability, authenticity integrity, and confidentiality of data; (5) the expiration of measures taken and the deletion of personal data obtained through these measures when the pandemic is declared to be under control, at the latest; (6) uploading of proximity data in case of a confirmed infection and appropriate methods of warning persons who have been in close contact with the infected person, who shall remain anonymous; and (7) transparency requirements on the privacy settings to ensure trust into the applications.

To Reproduce Steps to reproduce the behavior: Broadcasting of the Keys

Expected behavior No Broadcasting of infected Keys combined with Open Source and Open APIs.

Screenshots

Desktop (please complete the following information): All OSs, all versions.

Smartphone (please complete the following information): All Smartphones.

Additional context Major data privacy leak.

mikehelmick commented 4 years ago

/kind privacy

Please see the overall project page hosted at https://www.google.com/covid19/exposurenotifications/

In particular the FAQ and additional Terms of Service to utilize the on-device APIs.

Covid19Fighter commented 4 years ago

If the app is Open Source a person can change it and use the modified app, this person (tech-savvy) will not comply to your ToS. If the person does not use your store, you will not be able to notice this person has a different app.

Covid19Fighter commented 4 years ago

And the EU recommedation says you have to take measures to prevent it. A ToS will not do the job.

sethvargo commented 4 years ago

Hi @Covid19Fighter - thank you for taking the time to open an issue and provide feedback. The server does not store information about a specific person or device. We've gone to extensive lengths (which you can audit in this codebase) to make the server unable to associate a particular exposure key to a device or person. In addition to the standard Apple AppStore and Google Play Store review processes, apps must individually be added to an allowlist to use this API. The server is configured to validate against side-loaded APKs, mitigating a vector you've described.

You can read more about our architecture in Server Functional Requirements.

For legal questions, please see our FAQ and additional Terms of Service at https://www.google.com/covid19/exposurenotifications/, which specify which devices can utilize the on-device APIs, as well as what other APIs may be used in conjunction with the APIs (Term 3.c). If you have additional questions, please feel free to contact us at the email in the bottom of the README. Thank you and have a great day!

Covid19Fighter commented 4 years ago

Thank you for the fast answer. I do not find any info on the allowlist verification. How those the allowlist verification check that the app was not modified? They are open source, you can change the code.

sethvargo commented 4 years ago

Those answers are available in our FAQ, linked at the bottom of https://www.google.com/covid19/exposurenotifications/.

Covid19Fighter commented 4 years ago

Sorry, not finding the right sentence. Could you please quote?

sethvargo commented 4 years ago

Section 10. How will apps get approval to use this system? in our FAQ linked at the bottom of https://www.google.com/covid19/exposurenotifications/

Covid19Fighter commented 4 years ago

Ok, the guys that create the app will sign this and be approved. But they will publish the app as Open Source on a git. So anyone can modify this app and create a new one. So, how are you making sure this does not happen?

sethvargo commented 4 years ago

Hi @Covid19Fighter it's possible for people of any gender to create an app. Usually these apps will be developed by public health authorities (PHAs) or people working closely with PHAs. There's no requirement that implementers make their application open source. If someone were to modify the app and create a new one, that would be a different app that would need to go through an approval process to get added to the Apple App Store or Google Play Store.

Covid19Fighter commented 4 years ago

Ok, so if they publish it as Open Source (like in Germany) and someone creates a modification and does not publish it by the Stores and propagates it as apk download, they could get the keys and combine them with additional information. Right?

sethvargo commented 4 years ago

The code for the client or server application being open source is inconsequential to whether someone can make a custom app. Getting a user to install such an app would require them to enable installation from unknown sources.

Covid19Fighter commented 4 years ago

Sorry I do not understand the point. The user is the one that wants to deciphers the keys (hacker). He knows his sources.

Covid19Fighter commented 4 years ago

I think you have a major attack vector here.

sethvargo commented 4 years ago

Can you be more specific about which keys you mean?

Covid19Fighter commented 4 years ago

The keys of the infected persons you are broadcasting.

sethvargo commented 4 years ago

Let me back up to add clarity that might help answer your query. The allowlist is both a client-side and server-side allowlist. In order for an app to access the client API (SDK), it needs to be added to an allowlist. In order for that same app to publish keys to a server, it needs to be in the allowlist for the server.

If someone were to take an existing app implementation and sideload it, they would not be able to access the client-side API (SDK) without being added to the allowlist.

Covid19Fighter commented 4 years ago

Ok, but if it copies the app it will have the same identity as the original app. Or not?

mikehelmick commented 4 years ago

no, it will not.

mikehelmick commented 4 years ago

The temporary exposure keys that are sent to devices are only ever made available in large batches over a sufficiently large geographic area. There isn't anything to decipher, the keys are available for distribution to all mobile devices participating in the protocol.

If someone were to try and tamper with those exports, the file would fail to validate the signature as signed by the server and would be rejected by the operating system for import and matching.

Covid19Fighter commented 4 years ago

The user is not tampering with them, the user is only reading them and comparing them to the list of keys stored locally as the original app does. The only change on the software would be to store other information (i.e. timestamp) additionally to the information received from bluetooth). If you know when you met the infected users you can find out who they were.

sethvargo commented 4 years ago

The modified app cannot do this comparison because it doesn't have the root TTKs nor does it have access to the derived keys.

Covid19Fighter commented 4 years ago

But the original app is comparing also the keys to find who is infected, and gets true or false. So the modified app will do the same and only look when it got the key that is now broadcasted as infected. If the original app can, the new should also be able to do this. I mean, this is nothing new. It was mentioned on the DP-3T whitepaper by the professors. I thought you may have found a solution to this problem. "However, the centralised design does not allow proactive and retroactive linkage attacks by tech-savvy users to learn which contacts are infected"

mikehelmick commented 4 years ago

The app is not responsible for comparing the keys, nor does any app have access to the derived keys that have been collected over BLE.

I would encourage you to read more of the documents from https://www.google.com/covid19/exposurenotifications/

In particular the items under

Not deep linking because those documents could have new versions published at any time.

Covid19Fighter commented 4 years ago

Ok, I already went through documentation, but I am not finding any place that stops "proactive and retroactive linkage attacks by tech-savvy users". Maybe it is solved, but this only seems possible if most of the logic is done under the app on the OS side (not open) and the app is only a kind of a mask.

Covid19Fighter commented 4 years ago

The documentation is not extensive enough to be sure.

Covid19Fighter commented 4 years ago

By the way, measuring distance with Bluetooth doesn't sound physically possible. Do you have studies on this?

sethvargo commented 4 years ago

Hi @Covid19Fighter - this repository is for a sample server implementation. If you have specific questions about BLE, we're likely not the right group of people to answer them.

Are you able to describe a "proactive and retroactive linkage attacks by tech-savvy users"? I believe we've responded to all your concerns in previous comments.

kbobrowski commented 4 years ago

Hi @sethvargo , thanks for your previous answers, they confirm my understanding of security aspects of Exposure Notification API. Further question on this topic: it seems that based on published cryptographic and Bluetooth specification it's possible to write implementation of Exposure Notification API from scratch - meaning that it'd be possible to "enrich" it's output with some additional data. We can imagine that this modified implementation would be attaching timestamp and GPS position to each received RPI, and then instead of processing RPIs and TEKs in batch (producing just aggregate result) it would output which RPI matched with some TEK (meaning we'd know when and where contact with infected person happened).

Do you see a risk in this kind of "enriched" API implementation being developed?

sethvargo commented 4 years ago

Hi @kbobrowski thanks for your question. It's unlikely such an app/server implementation would pass the allowlisting requirements from the Google/Apple app stores. Our security and privacy team have gone to extensive lengths to remove the ability to associate a particular TEK to a device/person in our implementations. You can actually look at the closed issues history on this repository to see their feedback. Part of the reason we're developing the server in the open is so folks can inspect the privacy preserving aspects we've implemented.

kbobrowski commented 4 years ago

@sethvargo thanks, I'm sure that this kind of modified app (with re-implemented Exposure Notification API) won't appear in Play store, I had in mind situation where adversary might distribute APK. In this case I think safety would depend on whether country-specific application properly implemented SafetyNet for fetching TEK, and whether TEKs are handled properly on the device (not only in Google's layer, but also in country-specific layer on top), such that they cannot be revealed. I don't doubt that it's impossible to associate particular TEK to a person, I'm more worried about a situation where someone may acquire TEKs and timestamped / geolocated RPIs (since the latter is openly transmitted by Bluetooth), and would be able to infer when / where some contact with infected person took place.

sethvargo commented 4 years ago

Hi @kbobrowski - we've done our best to think through the various attack vectors. If you're able to reproduce an attack (or have an idea of how one might work), feel free to reach out to us confidentially at the email address in the README of this repo.

mh- commented 4 years ago

@sethvargo The point @Covid19Fighter was making has been published e.g. in this paper as "Nerd attack" (see also https://github.com/corona-warn-app/cwa-documentation/issues/102#issuecomment-631045106)

Personally, I don't see an issue with this: The "Nerd" would need to be a close contact of the infected person to be able to retrospectively identify them from e.g. timestamped GPS data, and I believe that most users who are willing to warn others about their infection through an app also make their identity known to their close contacts, through other means than this server.

Covid19Fighter commented 4 years ago

@mh- I am meeting a lot of people I am not a close contact. Most of business people do, and once you have created such an app you can distribute it and also the data you decipher to other "hackers" and voila you have a database of critical medical data. @sethvargo - Again, the idea of broadcasting all the critical medical data to millions of devices is against all data protection and hacker recommendations. Usually when you have one critical record you try to keep on one place. Here you copy millions of critical records to millions of devices. You can put 100 of layers of encryptions, rotate the keys, you will still get not one but hundreds of vectors. I think you got the whole decentralized idea wrong. Decentralized storage is good but decentralized matching is bad because of the broadcasting of critical data. No one on his rigth mind would do this with credit card data. It is like google sends your credit card information and the information of all other persons to all shops worldwide and they check if someone used their card there. Yes, you can try to secure all the software and hardware at all the shops and put hundreds of security mechanisms. Would I sleep well being the one who wrote the security concept? For sure not. Is this a GDPR problem? Yes it is. Is Open Source and Open APIs helping security here? Not if the concept is based on a huge data privacy leak. Will people trust Google with such data if the process is not transparent? Probably not even in USA, in Europe for sure not. Should you rethink the whole thing? Maybe. It would be much easier to secure a centralized server that does not store the requests. Furthermore, I am not a genuine hacker and the whole thing was even recognized as a security problem by the persons that wrote the paper. If I got this, a real hacker will do things I do not even think about and to discuss these problems by e-mail does not seem to be consistent with the open source idea. For me it increases the feeling you are not sure this is a good idea.

Covid19Fighter commented 4 years ago

Hi @Covid19Fighter - this repository is for a sample server implementation. If you have specific questions about BLE, we're likely not the right group of people to answer them.

Please, I would be glad to discuss BLE with the rigth group of people. Getting distance out of RSSI measure on a dynamical environment does not seem easy or even possible to me. This again, is a well known fact.

Covid19Fighter commented 4 years ago

@ all: Please do not think I am a kind of conspiracy person or even against a tracing app. I think to use technology to trace COVID19 is one of the most important tasks rigth now. But if you use broadcasting of critical information as communication against all data privacy concepts and BLE for distance measuring against every study published on this I am afraid we are loosing precious time here and am a little bit afraid about the resulting quality of your work. And I am sure you have the best people, but with the wrong concepts even the best people will fail.

sethvargo commented 4 years ago

Hi @Covid19Fighter - I believe we've addressed all of your feedback from previous comments. I kindly ask that you please take some time to read our published information on https://www.google.com/covid19/exposurenotifications/. Many of the concerns you've addressed are mitigated or are not present in the design of this system.

What steps would you like us to take to bring this issue to resolution?

Covid19Fighter commented 4 years ago

@sethvargo - Thank you very much for the answer, I appreciate your effort. I think I need to look at more specific documentation. It is unclear for me at which level you perform the key matching. If it is at a level you can access only from the OS and you secure the OS or if it something you can actually do at the app. It is also unclear how the infected key data is transported and stored locally and how the local received keys are stored locally (does the app have access to them or is it simply using some piece of logic buried deep on the OS). Without this information the leak possibilities are too large and I cannot analyze it. Again, I think broadcasting information that can be matched locally and can allow this kind of attack by nature is not a good practice. You can mitigate (and you try it hard), but not heal such a conceptional problem and I think it is a conceptional problem and not a physical problem because you try to match the data at the mobile device only to avoid sending the local data to a secure server as normal applications do and the result is that you broadcast critical data to millions of devices. Maybe if you publish more detailed info it helps. I will send you also my e-mail if you want to send me some information not disclosed here. If I understand everything I will close the ticket. I think I am able to understand it if it is well explained, I am trying and I should be able because even if I am no hacker I have a master in telecommunications engineering. I would also like to chat with your BLE people. To calculate exact meters out of RSSI measurements in hundreds of different devices (and millions of combinations), hundreds of positions (ear, pocket, table, sofa, hand) and infinite number of environments is not really something you can do only with Bluetooth. There are other possibilities, but I am not sure if you are using them.

pkleczko commented 4 years ago

@Covid19Fighter I think you don't understand how the process of whitelisting apps works. The fact that code source of each application will be published as an Open Source doensn't mean that anyone can modify this code and compile it as the same approved application. Every app is signed with a proper certificate. This certificate belongs to each Health Authority and SHOULD NOT BE PUBLISHED. So everybody can build the open sourced code but need to sign it with his own certificate. This certificate needs to be added to the whitelist by Google&Apple and app needs to be reviewed by them. So it will not be as easy as you think.

Another thing is that while risk calculation app owner has no access to RPIs of matched contacts - the API returns only general information - how many matched 'devices' you have, what is the calculated risk for each of them (based on your as an app owner risk calculation configuration), how long was the match (but even for longer 'meeting' app owner can read only up to 30minutes match in 5minuters interval) etc.

Also you as an app owner has NO ACCESS to scanned RPIs so you don't know that app has detected new RPI and add 'GPS and timestamp' to such event.

Covid19Fighter commented 4 years ago

@pkleczko - Thank you! This seems to make some sense from the security side. It indeed mitigates the problem further. Sorry but I did not find any place giving such exact definition of the process. If the app has a certificate and the certificate is whitelisted and you check at runtime this certificate everytime you may close some of the major vectors, you can try to simulate the whitelist authority but I assume this again is certified so it gets more complicated. If the App has no access to the RPIs and this is done by the OS you close some other possibilities. This is what I was asking. You still can access the BLE with the Libraries on an additional app (or are they closing BLE for any other app?) and check for devices and try to get a list, if they do not modify the BLE protocol (which does not seem possible) you should be able to read the RPIs without the API. If the OS is also storing the broadcasted RPIs of infected persons and handles them encrypted at all times (storage, memory and CPU) it gets more difficult. I mean, trying to secure this amount of data over this amount of devices using different hardware and OS is kind of impossible, but you are trying very hard. I still do not understand why you are doing it this way. It is not the usual way to do this and again it is not GDPR conform. By the way, this means the list of infected RPIs will not be handled by the Open Source app of the governments but by a propietary closed source Google component on the OS side. This is indeed more secure for attackers but also means everyone has to trust Google and Apple they will not use this data (they will have access to it). So they will have access to medical relevant data without a control instance. I am not sure this is clear to the public.

mikehelmick commented 4 years ago

Hi folks - I'm going to close this issue.

I believe the questions that have been asked have been answered. As @sethvargo pointed out in a previous comment. None of us are accredited lawyers in the EU and we cannot comment on GDPR conference.

As of right now, Google and Apple are NOT running this server. We are providing it to governments and public health authorities to run.

If you have a concrete reason that you believe this work does not conform to GDPR, please send an email to the address in the main README.md file and we will route it to the appropriate people.

/close

google-prow-robot commented 4 years ago

@mikehelmick: Closing this issue.

In response to [this](https://github.com/google/exposure-notifications-server/issues/367#issuecomment-632203700): >Hi folks - I'm going to close this issue. > >I believe the questions that have been asked have been answered. As @sethvargo pointed out in a previous comment. None of us are accredited lawyers in the EU and we cannot comment on GDPR conference. > >As of right now, Google and Apple are NOT running this server. We are providing it to governments and public health authorities to run. > >If you have a concrete reason that you believe this work does not conform to GDPR, please send an email to the address in the main README.md file and we will route it to the appropriate people. > >/close > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.