sa-tre / satre-specification

Standard Architecture for Trusted Research Environments specification
https://satre-specification.readthedocs.io
Creative Commons Attribution 4.0 International
18 stars 8 forks source link

[Discussion]: Questionnaire summary: Network connectivity #52

Open edwardchalstrey1 opened 1 year ago

edwardchalstrey1 commented 1 year ago

Summary

The percentage splits of responses to the SATRE survey question of what connectivity settings workspaces within TREs should have

Source

SATRE specification survey responses

Detail

Survey results:

5.1.a. Configurable connectivity between workspaces

No opinion Not important Nice to have Important Essential
20.00% 4.76% 20.00% 36.19% 19.05%

5.2.a. Configurable connectivity between a project and resources outside the TRE

No opinion Not important Nice to have Important Essential
12.38% 4.76% 27.62% 29.52% 25.71%

Intended Output

Who can help

Anyone

Specification section

https://satre-specification.readthedocs.io/en/latest/pillars/computing_technology.html#network-management (see #110)

crickpetebarnsley commented 1 year ago

The specification of the capabilities of a workspace is not clear so I may have misunderstood intention here. I seem to remember it defined as a computing place and I have taken it to be a VM infrastructure base don other Q in the survey.

A trusted environment for completing research that allows machines to be connected at a network level seems totally wrong.
It would bring in all sorts of risks and security situations that would need resolution in configuration, faulting and billing. As a result delivery and maintaining the levels of "trust" would need added resources for mitigations, controls and reporting. One research project TRE should not have the ability to connect to any other. The only connection should be to the required datasets needed to complete the processing work. Therefore 5.1 should not be supported.

Therefore it is essential to have connectivity to outside resources 5.2 (as long as they are data source resources - such as staging areas, containerised scheduled ingests facilities bridging to external datasets, etc).

What was the model of facilities that drove this question in the survey?

manics commented 1 year ago

Different TREs define workspaces in different ways. E.g. In AWS each researcher gets their own workspace/VM for a single project, and within that project researchers may be allowed to connect to each other's workspaces/VMs. Connectivity between workspaces/VMs in different projects is of course blocked.

edwardchalstrey1 commented 1 year ago

I think the definition question around "workspace" is not too much of a concern, see here we say "project":

Connectivity between users in the same project may be allowed, for example to support shared network services within the project.

We also say:

Limited outbound connectivity may be allowed for some services.

Which sort of vaguely addresses 5.2.a. Configurable connectivity between a project and resources outside the TRE

@crickpetebarnsley suggestion above describes some example outbound services/resources:

Therefore it is essential to have connectivity to outside resources 5.2 (as long as they are data source resources - such as staging areas, containerised scheduled ingests facilities bridging to external datasets, etc).

I'm not sure it's essential, but whether something is "outside" or "inside" the TRE may be a matter of definition. For example in the Turing Data Safe Haven we use containers in Azure Storage Accounts for data storage and access within the TRE - but these can also be accessed "outside" the TRE by admins, or by researchers provided a secure time-limited SAS url.

edwardchalstrey1 commented 1 year ago

@sa-tre/spec-maintainers Does anyone object to closing this issue? I think for the reasons in my comment above it's not clear what is meant by either connectivity between workspaces or connectivity outside the TRE - I'm not convinced there's something clear we need to include in the spec here

craddm commented 1 year ago

To bring in some of the relevant free-text summaries (#130), while some respondents expressed similar views as @crickpetebarnsley above, many were not quite so strict. A number of respondents expressed that collaborative tools are useful and allowing connectivity within a TRE - and to be clear, only within the bounds of a given project - to such resources would be nice to have. I would interpret @crickpetebarnsley's comments above as disallowing use of web apps like CodiMD or Gitlab, as implemented in the DSH, since those are not directly on the VMs that users use and are not necessary to access the data. Yet those tools are configured such that they are only accessible within a given environment. People could not use them to share data or code in other places, such as other SREs deployed for other projects. That implementing such tools would require additional resources, consideration for security issues etc is, I feel, a reason to include them as possibilities in the spec rather than exclude them, so that due consideration is given to such matters should developers want to implement such capabilities.

I'd say that as in the summary above, the spec as is does a decent job of covering this issue.

edwardchalstrey1 commented 1 year ago

That implementing such tools would require additional resources, consideration for security issues etc is, I feel, a reason to include them as possibilities in the spec rather than exclude them, so that due consideration is given to such matters should developers want to implement such capabilities.

I think this is a good point, however I don't know if we're getting too DSH-specific here? You're talking about connectivity between VMs that form part of a cloud-based TRE I guess e.g. a separate webapp or database VM. That's a bit different to the 2 items here which are "connectivity between workspaces" and "connectivity to resources outside the TRE" - unless maybe you think that webapp VMs count as being "outside"?

craddm commented 1 year ago

I think this is a good point, however I don't know if we're getting too DSH-specific here? You're talking about connectivity between VMs that form part of a cloud-based TRE I guess e.g. a separate webapp or database VM. That's a bit different to the 2 items here which are "connectivity between workspaces" and "connectivity to resources outside the TRE" - unless maybe you think that webapp VMs count as being "outside"?

Yes, I think I'm reacting a bit to the above comment which seems to me to say any connectivity between machines at all (except for connecting to data sources) should be disallowed. While I gave DSH examples, similarly in AzTRE you can have Gitlab servers etc that sit within a given workspace for use for a specific project only.

But you're right that this is getting a bit away from what exactly the questions were.

No objection to closing, mainly want to link in the free-text analysis.