trendscenter / coinstac

Collaborative Informatics and Neuroimaging Suite Toolkit for Anonymous Computation
MIT License
47 stars 19 forks source link

Perform a data privacy and protection audit of our system #1699

Open praeducer opened 1 year ago

praeducer commented 1 year ago

We should make sure we are at least meeting basic security requirements and local data protection and privacy laws of the countries we operate in.

Data Privacy

FAIR Guiding Principles

In 2016, the ‘[FAIR Guiding Principles for scientific data management and stewardship’](http://www.nature.com/articles/sdata201618) were published in Scientific Data. The authors intended to provide guidelines to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets. The principles emphasise machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in volume, complexity, and creation speed of data.

Learn more at: https://www.go-fair.org/fair-principles/

Data Security

HITRUST Cybersecurity Framework

Developed in collaboration with data protection professionals, the HITRUST CSF rationalizes relevant regulations and standards into a single overarching security and privacy framework. Because the HITRUST CSF is both risk- and compliance-based, organizations of varying risk profiles can customize the security and privacy control baselines through various factors, including organization type, size, systems, and compliance requirements.

HITRUST understands data protection compliance and the challenges of assembling and maintaining the many and varied programs, which is why our integrated approach ensures the components are aligned, maintained, and comprehensive in order to support your organization’s information security management program. Due to this, HITRUST CSF has become a widely adopted security and privacy framework across industries globally.

Learn more at: https://hitrustalliance.net/product-tool/hitrust-csf/

NIST Cybersecurity Framework

The National Institute of Standards and Technology (NIST) Cybersecurity Framework (CSF) is an adaptable set of fundamental guidelines designed to mitigate organizational risks and strengthen overall organizational security. 

Recognizing the national and economic security of the United States depends on the reliable function of critical infrastructure, the President issued Executive Order (EO) 13636, Improving Critical Infrastructure Cybersecurity, in February 2013. The Order directed NIST to work with stakeholders to develop a voluntary framework – based on existing standards, guidelines, and practices - for reducing cyber risks to critical infrastructure. The Cybersecurity Enhancement Act of 2014 reinforced NIST’s EO 13636 role.

Created through collaboration between industry and government, the voluntary Framework consists of standards, guidelines, and practices to promote the protection of critical infrastructure. The prioritized, flexible, repeatable, and cost-effective approach of the Framework helps owners and operators of critical infrastructure to manage cybersecurity-related risk.

Learn more at: https://www.nist.gov/cyberframework

Data Protection Laws

Privacy and security laws or regulations applicable to the processing of personal data. These laws include any applicable US federal or state law or regulation relevant to this software.

California Consumer Privacy Act

The California Consumer Privacy Act of 2018 (CCPA) gives consumers more control over the personal information that businesses collect about them and the CCPA regulations provide guidance on how to implement the law. 

This landmark law secures new privacy rights for California consumers, including:

Businesses are required to give consumers certain notices explaining their privacy practices. The CCPA applies to many businesses, including data brokers.

Learn more at: https://www.oag.ca.gov/privacy/ccpa

General Data Protection Regulation (GDPR)

The General Data Protection Regulation (EU) (GDPR) is a regulation in EU law on data protection and privacy in the European Union (EU) and the European Economic Area (EEA). The GDPR is an important component of EU privacy law and of human rights law, in particular Article 8(1) of the Charter of Fundamental Rights of the European Union. It also addresses the transfer of personal data outside the EU and EEA areas. The GDPR's primary aim is to enhance individuals' control and rights over their personal data and to simplify the regulatory environment for international business.

The GDPR 2016 has eleven chapters, concerning general provisions, principles, rights of the data subject, duties of data controllers or processors, transfers of personal data to third countries, supervisory authorities, cooperation among member states, remedies, liability or penalties for breach of rights, and miscellaneous final provisions.

The regulation applies if the data controller (an organisation that collects data from EU residents), or processor (an organisation that processes data on behalf of a data controller like cloud service providers), or the data subject (person) is based in the EU. 

Learn more at: https://gdpr-info.eu/

Health Insurance Portability and Accountability Act (HIPAA)

The Health Insurance Portability and Accountability Act of 1996 (HIPAA) is a federal law that required the creation of national standards to protect sensitive patient health information from being disclosed without the patient’s consent or knowledge. The US Department of Health and Human Services (HHS) issued the HIPAA Privacy Rule to implement the requirements of HIPAA. The HIPAA Security Rule protects a subset of information covered by the Privacy Rule.

Learn more at: https://www.cdc.gov/phlp/publications/topic/hipaa.html

Swiss Federal Data Protection Act

This Act aims to protect the privacy and the fundamental rights of persons when their data is processed.

Principles:

  1. Personal data may only be processed lawfully.
  2. Its processing must be carried out in good faith and must be proportionate.
  3. Personal data may only be processed for the purpose indicated at the time of collection, that is evident from the circumstances, or that is provided for by law.
  4. The collection of personal data and in particular the purpose of its processing must be evident to the data subject.
  5. If the consent of the data subject is required for the processing of personal data, such consent is valid only if given voluntarily on the provision of adequate information. Additionally, consent must be given expressly in the case of processing of sensitive personal data or personality profiles.

Learn more at: https://www.fedlex.admin.ch/eli/cc/1993/1945_1945_1945/en

praeducer commented 1 year ago

We may want to create a Data Use Agreement: https://github.com/trendscenter/coinstac/issues/1700

praeducer commented 1 year ago

Feature requests related to some feedback from @Anand Sarwate, @Dylan Martin, Vault paper reviewers, and Gunther's group:

  1. Auditing capability for when Vaults are accessed, what computations are run on what Vaults, and how Vault data and computations change over time.
  2. More transparency into how Vault data is protected and de-identified.
  3. Improved protection on what computations can be run on devices.
  4. Privacy controls (like visibility) over pipelines.
praeducer commented 1 year ago

Vault paper feedback: "It sounds as though the data host is also responsible for doing any computations, and it is not clear from the paper what control the host has over the applications a remote user might choose to run. There are obvious security and HIPAA issues if a collaborator is malicious or, more likely, simply careless. A highly compute-intensive pipeline being run by a remote collaborator could also adversely affect the host's access to compute bandwidth. Please explain how control over the computations that can be done with the hosted data is established. One control point is the requirement for membership in a consortium, but control might not be in the hands of an individual host but instead in those of a consortium leader, whose interests may not be aligned with the data host's. This could be addressed either in the system description or in the Limitations section, 4.1"

In relation to the comment on DUAs on line 211, have there been instances where an institution has approved Vault access to data that has not been deidentified? Besides the use cases, can the authors provide some usage statistics for COINSTAC and for the Vaults?

praeducer commented 1 year ago

Enhancements can be made to provide vault owners with greater control over resource consumption. Resource-intensive analyses may cause slowdowns or crashes due to high compute usage from concurrent computations. Vault owners' control is currently restricted to selecting a predefined list of approved computations. Potential solutions involve giving vault owners finer control over their resources by allowing them to: \begin{itemize} \item Set limits on CPU usage \item Restrict the number of simultaneous computation runs \item Approve individual computation runs \item Limit consortia access to vaults \item Control which users can add vaults to consortia \item Cap the overall allowable number of computation runs \item Set expiration dates for approval permissions \end{itemize} In addition to more granular control, addressing compute limitations can include mechanisms for expanding compute capacity, such as deploying multiple instances behind a load balancer or dynamically scaling compute resources.

praeducer commented 1 year ago

Need to find a way to validate users have a valid Data User Agreement. e.g. if someone has access to the data stored somewhere else like Oasis then it could transfer to the Vault.