spe-uob / 2020-Healthcare-Data-Simulators

A configurable synthetic patient generator which delivers and centralizes information on a repetitional continuous basis via the message broker technology and through a healthcare integration engine
MIT License
8 stars 1 forks source link

Portfolio B - Personal Data, Privacy, Security and Ethics Management #147

Closed vladbucur2000 closed 3 years ago

vladbucur2000 commented 3 years ago

A description of your strategy to manage personal data that your application is collecting. If there is any personal data, you need to describe how you are collecting user consent and how you're managing personal data, such that GDPR compliance is feasible.

A description of how you are complying to the cookie law.

A description of how your security strategy.

A statement that describes any ethics concerns that your application needs to take into account and you are safeguarding your users.

ennaena commented 3 years ago

As this project was intended to be a simulation that will further test the other team’s work, all data that is being produced in our product is synthetically produced hence no real patient data has been utilized. No data from users is collected or used in this project hence there is no security gaps that we had to bridge during the development of the project. Further, when the data is passing between the parts of the project, it’s done that more confidential parts are being sent confidentially. Lastly in the bigger picture where the simulated data is being sent over to the other team, the system with tokens is used to ensure that only authorised users are able to simulate data into the Data Lake. The separate approval for testing the software was required, hence we applied for it on the 23rd November 2020 11:28. It involved checking on the separate ethics (NHS REC review as it involves patients) and governance approvals (such as Health Research Authority HRA). There is a confirmation from our client’s side that the ethic stance of this project has been approved before the data this project has used has been tested with. The simulated data consist of the information about hypothetical patient with regional information, age, gender, and hypothetical illness our imaginary patient would have. All simulated data is Anonymous and based on the publicly available statistics that is being used by Synthea in order to simulate the data that users require.

georgeedward2000 commented 3 years ago

Review: I would replace the long phrases with shorter, more brief sentences.

vladbucur2000 commented 3 years ago

I suggest changing the structure a little bit. The paragraphs can be arranged to look more organised.

" Further, when the data is passing between the parts of the project, it’s done that more confidential parts are being sent confidentially" what did you want to say here?

" able to simulate data into the Data Lake." we are simulating and sending the data, it sounds very confusing in this sentence.

"There is a confirmation from our client’s side that the ethic stance of this project has been approved before the data this project has used has been tested with." where is the confirmation? should we include one?

" All simulated data is Anonymous" the data is synthetic, not anonymous...

I also advise to add something about the OpenPseudonymiser. (@Wyktorrr I think you can give the best advice on this matter)

Wyktorrr commented 3 years ago

Our project is not just a tool to test the other teams' work. We are providing them valuable data, under the right format respecting healthcare standards, FHIR. Sensitive patient data (NHS number) is being masked using the open service OpenPseudomiser in all resources produced by Synthea before being sent. The idea of using the third party Synthea should be underlined because we cannot use real data. The fact that our application can be easily adapted to work in production with real data, not virtual, synthetic data should also be emphasized. This is because sensitive data is masked. The idea of using Synthea with customized data to work with UK population to avoid GDPR regulations should be the spotlight of this section. Furthermore, please create shorter sentences and guide with the feedback we received from portfolio A. Thank you, Victor!

Wyktorrr commented 3 years ago

Synthea does not rely on public statistics regarding population. It is just a synthetic patient generator. Generates data under some formats (CSV, FHIR) and related to medical standards, but it doesn't rely on any conducted statistics. The idea of using tokens to obtain authorization to publish data to a server is not unique for our project. Could be mentioned, but I suggest you read Architecture and Requirements sections.

Wyktorrr commented 3 years ago

OpenPseudomiser masks data using a salt. Some documentation on what salts are and how they are used in cryptography would be nice.

ennaena commented 3 years ago

I think I now covered everything all of you mentioned

ennaena commented 3 years ago

All data is produced by Synthea Patient Generator, which is a synthetic patient population simulator. It feeds the project with synthetic yet realistic patient data related to health records (such as conditions) in different formats. The data is modified to realistically represent the patient demographics of the UK. As this project produces and utilizes only synthetic data, no real-user data information was gathered, used or stored. This way, the regulations under Data Protection Act 2018 weren’t applicable to this project. Further, The project uses OpenPseudonymiser to hide the confidential data (for example, NHS number that is used as a patient identifier) by pseudonymising datasets by digesting some columns of a CSV files. OpenPseudonymiser uses salt which protects from attacks that are based on precomputed tables as it’s a one-way function that uses a cryptographic hash function to add an extra input to the data that is being used. It also makes multiple identical data sets in a process of encryption different, which makes it more challenging to break the encryption. The separate approval for testing the software was required, hence we applied for it on the 23rd November 2020 11:28. It involved checking on the separate ethics (NHS REC review as it involves patients) and governance approvals (such as Health Research Authority HRA).

ennaena commented 3 years ago

OpenPseudomiser masks data using a salt. Some documentation on what salts are and how they are used in cryptography would be nice.

okay so I founds some that I can put in, also what referencing style do I need to use?

ennaena commented 3 years ago

All data is produced by Synthea Patient Generator, which is a synthetic patient population simulator. It feeds the project with synthetic yet realistic patient data related to health records (such as conditions) in different formats. The data is modified to realistically represent the patient demographics of the UK. As this project produces and utilizes only synthetic data, no real-user information was gathered, used or stored. This way, the regulations under Data Protection Act 2018 weren’t applicable to this project. If at some later point this project was adapted to work with date from real-life users, UK’s implementation of General Protection Regulation would have to be followed. The project uses OpenPseudonymiser to hide the confidential data (for example, NHS number that is used as a patient identifier) by pseudonymising datasets by digesting some columns of a CSV files. OpenPseudonymiser uses salt which protects from attacks that are based on precomputed tables as it’s a one-way function that uses a cryptographic hash function to add an extra input to the data that is being used. It also makes multiple identical data sets in a process of encryption different, which makes it more challenging to break the encryption. The separate approval for testing the software was required, hence we applied for it on the 23rd November 2020 11:28. It involved checking on the separate ethics (NHS REC review as it involves patients) and governance approvals (such as Health Research Authority HRA).

Wyktorrr commented 3 years ago

Good job!