5. Differential privacy

This is an optional thing, if we have time. A great use case for the AWS ECR.

So, imagine the data comes over time in small batches, like 1 patient at a time. We want to share this data but also prevent bad actors to reverse-engineer the values of X. Reverse-engineer a single record is much easier.

Approach: generate random fake records, and send a combination of a true record plus a few fake ones. Harder to reverse-engineer, and even if done nobody knows which is the true record and which are fake.

We will test some strategies and compare on the same graphs of model performance vs. data size.

Steps:

[ ] (@akusok) Collect more ideas from streamed federated learning
[ ] (@akusok) Create a small test code in Python

tamiratGit / FedELM

5. Differential privacy #5