Apply restrictive network policies to single user notebooks

vvcb commented 7 months ago

We must have egress and ingress policies in place for all pods deployed for researchers. There are several SATRE specifications that refer to this.

Some of this will be enforced by Azure FIrewall and Azure Policies around private endpoints.

However, we will need additional policies at a pod level that controls intra-cluster communications as well as egress traffic out of the cluster. This issue will track this discussion.

There is detailed documentation of JupyterHub's default network policies here -> https://z2jh.jupyter.org/en/latest/administrator/security.html#kubernetes-network-policies

qcaas-nhs-sjt commented 6 months ago

@vvcb please could you confirm what external and internal addresses (if any) the jupyter nodes need to be able to see?

qcaas-nhs-sjt commented 4 months ago

Linking to lsc-sde/lsc-sde#38

vvcb commented 4 months ago

@qcaas-nhs-sjt , it is difficult to provide a comprehensive list of allowed addresses but will be good to start somewhere.

One way to approach this would be to look at levels of trust. Users will fall into two broad categories:

Internal users: data scientists, BI analysts, academics who either hold a substantive or honorary contract with one of the LSC NHS organisations
External users: Researchers with limited 'letters of access' or project-specific, time-limited 'research-passports' and commercial entities undertaking data analysis for operational intelligence.

For 1, I can't see a good reason to restrict access beyond what the default NHS network allows/disallows - but happy to discuss. For 2, it will be good to start with the most restrictive option and open up access as required. For instance, we may allow access to PyPi and CRAN although most secure environments will provide access to a curated package mirror (eg. https://help.sonatype.com/en/pypi-repositories.html) rather than the public ones.

So, the first step is to be able to assign labels to pods that will place them into one of several 'security/trust' classes to which we can then apply these policies. These labels could be assigned as part of the workspace management CRD for JupyterHub,

I am meeting the ELabs lead this Thursday and will ask about how they solve this. But, we are not building a full-blown TRE - NWSDE will provide that through AzureTRE.

A more important feature may be Gitea. We need to have a separate discussion regarding this - pros and cons of Gitea vs GitHub Enterprise (which we already have),

qcaas-nhs-sjt commented 4 months ago

@qcaas-nhs-sjt , it is difficult to provide a comprehensive list of allowed addresses but will be good to start somewhere.

Appreciate this will definitely be a moving list, but each address should really be reviewed and considered for how it might be used to potentially transfer data out of the system and where necessary alternative routes should be considered.

One way to approach this would be to look at levels of trust. Users will fall into two broad categories:
1. **Internal users:**  data scientists, BI analysts, academics who either hold a substantive or honorary contract with one of the LSC NHS organisations

2. **External users:**  Researchers with limited 'letters of access' or project-specific, time-limited 'research-passports' and commercial entities undertaking data analysis for operational intelligence.
For 1, I can't see a good reason to restrict access beyond what the default NHS network allows/disallows - but happy to discuss. For 2, it will be good to start with the most restrictive option and open up access as required.

I would argue that if the emphasis on the environment is to build a secure environment then we need to start with the principle of least privilege for all users, regardless of internal or external. E.g. give them the minimum that they need to get the job done.

My experience of NHS networks is that they are often designed in such a way that once you're on the network there is a level of trust which then allows you to gain access to other resources. This environment is connected to that network, so potentially this could be used to access other NHS systems. I think therefore we need to be careful about what any user has access to on the LTH network and even on the wider LSC Azure Network. Again the solution to this would be lock down everything and open up as required

For instance, we may allow access to PyPi and CRAN although most secure environments will provide access to a curated package mirror (eg. https://help.sonatype.com/en/pypi-repositories.html) rather than the public ones.

I agree that this is definitely something we will need, I would also suggest that we might want an internal git server in place as well. Needed repositories are forked automatically onto the server and so any commits are done against the git server which keeps them internal. If we then want to push back a change to the originating repository then we need to raise a request and have approvals processes for that to happen. This way github cannot be used as a way to export data from the environment.

So, the first step is to be able to assign labels to pods that will place them into one of several 'security/trust' classes to which we can then apply these policies. These labels could be assigned as part of the workspace management CRD for JupyterHub,

My view was that we have another operator, which would watch the workspace definition and manage network policies. These would provide access based on the already existing workspace label. This means that we can open up access to a specific workspace while denying access to another.

By keeping these separate we don't need to give jupyter hub too many permissions which could increase our vulnerability on that product.

lsc-sde / iac-flux-jupyter

Apply restrictive network policies to single user notebooks #9