scrtlabs / SafeTrace

Privacy preserving voluntary Covid-19 self-reporting platform. Share your location history and status, get alerts you are in high risk areas and identify high risk regions
MIT License
126 stars 27 forks source link

Deanonymizable/extractable data #7

Open FishmanL opened 4 years ago

FishmanL commented 4 years ago

This platform as currently defined allows for extraction of everyone else who's entered's location via search-decision reductions, needs DP

lacabra commented 4 years ago

Thanks @FishmanL for your comment. Can you spell out DP for clarity, please?

Would you please mind elaborating on the search-decision reduction argument, and argue for how to mitigate this shortcoming?

Thank you 🙏

FishmanL commented 4 years ago

Sure: DP = differential privacy

Search-decision: you repeatedly put in your location at slightly different places to figure out/triangulate the precise locations of every person with the virus, first executing a coordinated attack where you drop new pins at a grid across the place you want to search in order to locate every case.

You can mitigate this by adding some small amount of noise to each person's initial location, increasing the amount of added noise over time

cankisagun commented 4 years ago

@FishmanL we are currently working on a limited MVP where we can build this. It would be great to add this to the development roadmap and implement as we roll out.

Are there off-the shelf DP libraries in Rust that you can point us to?

FishmanL commented 4 years ago

None that I know of.

lacabra commented 4 years ago

Thanks for your insights @FishmanL. I would like to challenge your assumptions here, because I question that what you propose relates to the workflow we envision, which is as follows:

  1. Users who tested positive add timestamped-locations to the dataset inside the enclave
  2. An attacker who wants to de-anonymize data does not know neither the number of users who have uploaded data nor the number of locations each user has entered. When the attacker queries the enclave for a match, she will get a timestamped location where at least one individual who tested positive has been within a parametrizable radius r (which we can set to not be smaller than a given threshold) and a time interval t (again no smaller than a threshold) for a given time and location, but she will not know if there was an individual or more, nor she will get any other userid for that match.

So my question is how can she leverage differential privacy to obtain any information about any user in the set, if those individuals take the precaution of not including home addresses or other locations that can uniquely identify them by themselves? I understand how DP works, but I don't think it applies in the data flow we are envisioning.

Thoughts?

FishmanL commented 4 years ago

A constant circle is no better than a single point, since you can figure out the bounds of the circle with enough queries and just note the center. (in fact, no deterministic answering procedure solves this issue, you need randomness.) Same's true for time.

This doesn't fully allow you to deanonymize users, it just allows you to get exact times and locations (number of individuals is also possible by repeated queries, which'll allow you to split up circles into multiple separate onew). How you get from there to actual users is....let's say 'left to the reader' (in smaller towns it's trivial, in cities it's harder).

ainsleys commented 4 years ago

@FishmanL would rate-limiting queries and deleting trailing data (i.e., anything older than 14 days) reduce the risk here? As I understand it in your model, an attacker is essentially creating a series of fake data sets and modifying the time and location slightly every time to "scan" for matches. This could be addressed by say, only allowing once-per-day-per-user updates, or possibly trying to ID and limit this behavior in the client code.

FishmanL commented 4 years ago

I mean, I don't see any way to handle scaling this to lots of users (which is the only way it's really useful) without risking 'an attacker makes lots of fake users that are near each other'

ainsleys commented 4 years ago

Yeah, it's worth looking into what the best options are for making it difficult or expensive to create a ton of fake users without compromising privacy. We could require sign on with some service that provides a layer sybil protection.

ainsleys commented 4 years ago

43 for @FishmanL current work on this.