Open craddm opened 1 year ago
The specification does not currently directly address code/software ingress, unless I've missed it. Thus, we may need to add a statement that covers connecting a mechanism for code/software ingress to info gov policy, comparable to the current statements for data.
If we're going to add a statement on software ingress, it shouldn't go in 3.1. Data lifecycle management - we might want to be careful how we word this so as to be inclusive to different TRE types. In DSH, I think we have a software ingress process which is the same as for data, whereas in OpenSAFELY code ingress is happening every time an analysis is run.
I suggest:
@sa-tre/spec-maintainers thoughts on this? ^
I think it'd be worth stating what we see as the difference between data, software and code before creating any statements, since it's not always clear cut. Are there some general principles we can come up with for all ingress/egress? For example:
Another thought: streaming/automated "egress". For example, a restricted limited dashboard may be judged acceptable to automatically show a live output. In https://github.com/sa-tre/satre-specification/pull/132 we discussed live reporting of data use- but this is also a form of egress.
Summary
A summary of the free text (non-categorical responses) to the above question
Source
No response
Detail
Respondents agree that network connectivity should be tightly controlled to avoid failures of information governance, such as leakage of sensitive data from a TRE or even between workspaces within a TRE. But configurability is key. The needs of different projects vary widely, and while some may operate well without requiring network connectivity either outside or within the TRE, others may not. And as noted by one respondent, the relationship between network connectivity and contractual relationships between the data owners and those who require access to the data can be complex. Thus, the precise configuration of network connectivity may need to be decided on a case-by-case basis, which implies that a TRE should be configurable to suit the circumstances.
Network connectivity to resources outside the TRE should be restricted by default. However, there are many cases in which access to external resources may be useful or even essential. Access to external software repositories such as CRAN/PyPi is typically perceived as desirable, if not essential. Connectivity to allow import of project-specific code or data into a workspace may also be desirable. However, these needs may be met by an ingress procedure that does not require direct connectivity to the external resource from within the TRE (e.g. an airlock procedure mediated by TRE administrators). In general, it is desirable that a mechanism exists by which access to external resource can be provided on a project-by-project level, subject to information governance policy.
Some respondents express the position that external resources should also be TRE-like, or at least known and trusted. Again, this may vary on a case-by-case basis. Others note that external connectivity would be required for access to federated analytics services, which in principle should protect data privacy while allowing access to advanced computational resources.
Network connectivity within a TRE should be considered to enable collaboration, which several respondents considered to be essential. Again, this should be tightly controlled so as not to allow data leakage between workspaces. Thus, for example, users within a specific workspace should be able to collaborate and share code or data between themselves, but should not be able to link that data to datasets from outside that workspace. Projects should typically be isolated from one another.
It appears that the existing specification already covers most concerns of the respondents. The provisions of section 2.3 Network Management address isolation of workspaces through limitation of outbound connections and disallowing connectivity between users on different projects or with access to different datasets.
Section 2.1.2 Software tools covers the ability to access external resources such as CRAN/PyPi, and enabling collaboration within workspaces through shared tools such as databases and web apps accessible only within a given workspace. Notably, the section does not specify that such shared tools are required to be directly within the TRE, but only that they are shared only with users of a specific project. This does open the possibility for external resources being shared privately between project users.
Section 3.1. Data lifecycle management covers concerns about ingress and egress of data, in that it requires a TRE to have a process of ingress/egress that ensures all information governance policies are adhered to.
Some possible points of discussion:
Intended Output
No response
Who can help
@sa-tre/spec-maintainers