jsheunis commented 4 years ago

Privacy preserving tech: the tools for safe open data use

By Emma Bluemke, University of Oxford

Theme: Open Data 2.0
Format: Lightning talk

Abstract

In medical imaging, necessary privacy concerns limit us from fully maximizing the benefits of AI in our research. Fortunately, with other industries also limited by regulations of private data, three cutting edge techniques have been developed that have huge potential for the future of machine learning in healthcare: federated learning, differential privacy, and encrypted computation. These modern privacy techniques would allow us to train our models on encrypted data from multiple institutions, hospitals, and clinics without sharing the patient data. Recently, these techniques have become increasingly easier for researchers to implement, thanks to the efforts of scientists from Google, DeepMind, Apple, OpenAI, and many others.

It's also becoming increasingly important to maintain this data privacy: true anonymization of data is difficult to achieve because it's unclear what kind of information machine learning can extract from seemingly innocuous data. For example, it's possible to predict the age and sex of a patient from some medical images, and we've seen that in some cases, multiple anonymized datasets can be combined to deanonymize them.

These tools will make is easy for us (imaging scientists) to securely train our models while preserving patient privacy, without being privacy experts ourselves.

It's important that our medical imaging community is aware of these new possibilities. These developments could inspire new collaborations between institutions, enable meta-analysis that were previously considered impossible, and allow us to make rapid improvements in our current AI models as we're able to train them on more data.

Not only will this allow us to have more training data, it will allow us to have more accurate training data: if we can train on data from other institutions worldwide, we can properly diversify our datasets to ensure our research better serves our world's population. For example, current volunteer-base datasets can often feature a disproportionate number of young, university student subjects, which results in training data that is not representative of our patient populations.

https://blog.openmined.org/federated-learning-differential-privacy-and-encrypted-computation-for-medical-imaging/

And I'd like to mention that free, open-source tools like PySyft and PyGrid are & will soon be available for this purpose:

https://blog.openmined.org/what-is-pygrid-demo/

PyGrid is a peer-to-peer platform for private data science and federated learning. With PyGrid, data owners can provide, monitor, and manage access to their own private data clusters. The data does not leave the data owner’s server. Data scientists can then use PyGrid to perform private statistical analysis on the private dataset, or even perform federated learning across multiple institution’s datasets.

Just to be clear - this has nothing to do with blockchain.

Useful Links

https://www.openmined.org/ []()

Tagging @em-blue

em-blue commented 4 years ago

Since OHBM is international, I'd like to also mention that PySyft tutorials have been translated into 15 languages so far: https://github.com/OpenMined/PySyft/tree/master/examples/tutorials/translations

em-blue commented 4 years ago

Looks good to me!

ohbm / osr2020

Open Data 2.0 (Lightning talk): Privacy preserving tech: the tools for safe open data use #18

Privacy preserving tech: the tools for safe open data use

Abstract

Useful Links