Closed qcaas-nhs-sjt closed 3 months ago
@vvcb further to our discussion today I've put together a quick overview of the options here
@qcaas-nhs-sjt , thank you very much for putting this together. I sense that the CRD solution would be the simplest, most easily explainable and manageable solution of the three (with the third being KeyCloak) with easy extensibility built in.
From a maintenance POV, it certainly makes it easier for me as it keeps this within k8s.
From an audit trail POV, the CRDs can be version controlled and will fit in with the existing Flux-based workflow. And I love the idea of using the CRDs to do any number of additional operations on workspaces as you describe - all under version control.
This also solves the expiring token issue we have seen with federated auth between KeyCloak and Entra.
Looks like we have a clear winner.
@vvcb great, I'll set about creating the tickets to get this work done. Can I assume that this will take priority over the work on the OHDSI applications?
@vvcb Per our conversation this morning, this is the current priority
This has been agreed and initial version is deployed
Further to our conversation today it appears as though keycloak may not be the right solution for our needs and may in fact be over complicating what we need to accomplish. This ticket is to outline the architecture for a new workspace management solution that will ultimately remove the need for keycloak and can therefore go straight to using Entra ID (azure active directory) or indeed any SSO solution for managing user identity.
Option 1: CRD Model
The suggestion is that we build two CRD's, the first representing a workspace:
The second CRD would be a workspace to user binding, that says which workspace a user has access to:
In kubernetes we'd need to:
We'd create a new python module called kubespawner_workspace_mgmt this would provide methods for accessing these from kubernetes. This will have objects defining a structure equivalent to the above CRD's
There will then be classes for reading these from kubernetes:
There would also be a helper class to make it easier for the kubespawner to interact with the client
The new module would then be referenced by jupyterhub_custom_config.py:
The advantage to using CRD's are:
The disadvantages are:
Option 2: Database
Another option is that we build a database to manage this instead, presumably on postgresql server:
Like in option 1 we'd create a new python module called kubespawner_workspace_mgmt this would provide methods for accessing these from kubernetes. This will have objects defining a structure equivalent to the above CRD's
There will then be classes for reading these from the database:
There would also be a helper class to make it easier for the kubespawner to interact with the client
The new module would then be referenced by jupyterhub_custom_config.py:
We will then also need to develop a backend API to service these to a management portal via a rest API.
We would then build a management portal for managing it.
The advantages to using a managed database are:
The disadvantages are: