Booking system upgrades for v1

practable / relay

Secure websocket relay server and clients for sharing video, data, and ssh across firewall boundaries

GNU Affero General Public License v3.0

7 stars 2 forks source link

Issues:

A number of upgrades to the booking system are desirable to increase improve user experience and share admin duties. a/ advance booking b/ cancel booking c/ allow customisable UI sets depending on which pool/group (extension: per user/service account) d/ allow flexible/customisable booking periods with per-pool/group (extension: per user/service account) min & maximum limits e/ allow an experiment to be booked from multiple different pools/groups f/ develop tools to allow updating the booking system while experiments are still booked out (current practice is all-in-one upload of a new manifest) and/or separate the administration of experimental existence from its pool/group assignments if required to simplify this g/ provide separate admin roles for experiment existence configuration, experiment pool/group assignment, and pool/group properties, so that tasks such as taking specific experiments away from public view to be used for development, can be done by developers, without risking them being able to delete e.g. all other experiments. So, per-experiment administration control.

Impact:

(a),(b) require moving from JWT to sessions, and from in-memory-single-node booking records to persistent storage (with high availability) (c)-(f) suggest separate micro-services for different aspects of the booking process, e.g. one service to manage the kit being booked out or not, and another service to manage how a certain set of users go about booking the kit out and obtaining a UI (this again might split into different services to ease configuration - although then discovery/aggregration of booking policies/services is required) (g) relates to (c)-(f) but adds an additional layer of granularity - for this resource you can do [X]. Currently,

Plans

We can look at elements of (c)-(f) for the current revision of the system (v1 release project), and possibly even (a), subject to developing persistent identity management that is not reliant on the client-side cache which can be refreshed at any time (thus bookings would be lost). Options include sending emails with short confirmation codes (e.g. access booking with last name and 6-char code like a flight booking) However, including cancellation of already-started bookings, i.e. moving to sessions is a bigger task.

Thought on managing static assets: in k8s

git-sync looks promising for hosting static assets.

Development needs

Individual developers will want to get their assets uploaded on ad-hoc basis, which means they'd need the ability to commit directly to a repo that is serving what they need. Sharing a single repo between all developers would be risky e.g. with many/inexperienced developers or once enough time has gone by you forget what your previous asset usages are.

Namespacing

So a possible solution would be to name-space them into different top-level directories in /usr/share/nginx/html, and having separate git-sync sidecars for each repo. We don't want release stage names in the repos either, because otherwise our development code is not the same as our production code (unless we separate out the links to images as config for deploy ... which we may need to do). Promoting a user interface from dev->staging would involve updating the related static asset repos as well - these could simply be the same repo, but with git-sync sidecar in the static asset container in the production system pointing at main branch, and the git-sync sidecar in the static asset container in the dev system pointing at the develop branch (And similarly if we need additional release stages in the future).

Deleting assets someone else needs

It's still not a perfect solution to use small namespace repos because the overhead in setting them up probably means one repo will cover multiple apps created by a particular developer or organisational unit. Since in git you can delete things that you've forgotten you still need, and immutable storage with hashed suffixes doesn't let us update images without updating code, then we'd want to establish a procedure that deletes are not allowed, and that you should duplicate assets that you want to re-use, rather than rely on another developer keeping the version you like - they may have different needs).

Tokens should not be guessable

At the moment, we keep our assets in a private repo and you cannot list the files (nginx set to prevent this) - hence you cannot inspect the available tokens, so we can rely on the entropy in their names to make them a reasonable bet that they are only available to people we gave them to (and anyone they shared them with). That's ok for now.

Private repo or public?

But we don't want a private repo for these tokens because PAT is not supported yet by git-sync, and I want to avoid the risk of having an ssh key to the entire practable public repo sitting in a k8s container (minimise blast radius of a potential security issue) so we should just use public repos for assets and serve secure tokens.

How to serve tokens?

For tokens, if we put them in a public repo, then they are no longer have high-entropy unguessable names, because they can be inspected. We'll possibly want to keep using tokens even during the early transition into persistent identities, because getting class role information from the IdP may not be an option in all cases, requires extra coding, and doesn't cover cases where we want to grant access that is additional to any roles held in their organisation supplying IdP to us (e.g. new adopter wants to try a new experiment etc). So secure tokens matter.

It would be easier to configure tokens via a separate service, rather than try to make the static asset repos private. We can use a config-map to supply them (so that tokens cannot be inspected in the public git repo) - it is probably not necessary to go as far as treating them as secrets, because they will be freely given to any user who can guess their name (or is given it, which is the intended mechanism) and so they will be no less secure than hosting something we put into the container via a secret, but it would be more faff. Also, we can just restart the tokens container when we need to update the tokens, and save triggering a download of all the static assets again.

premature optimisation of caching ...

We could also consider some sort of CDN approach here but for our volume of usage, downloading from git our entire asset base each container restart is probably an acceptable tradeoff for now. So consider more efficient caching or CDN approach to be premature optimisation for now.

Cancellation of JWT tokens - some thoughts

Being able to cancel JWT tokens might offer an intermediate step that avoids having to re-factor the entire ecosystem to work off of sessions. And we might want to keep our internal sessions separate from our login sessions anyway, to separate access to experiments from authorisation to interact with bookings etc.

We can keep a deny list of used tokens, which we purge when each one expires. [Use caching to avoid slow performance]https://piotrgankiewicz.com/2017/12/07/jwt-refresh-tokens-and-net-core/)

Anyone with a JWT token has access to a resource, so if we have an endpoint where any authenticated user can DELETE a JWT token, then that session is cancelled. This could potentially be linked to from user interfaces directly, although it more tightly couples UI to the system and might be painful if we go through a period of rapid system development. Equally, it would be great to be able to extend a session as well - e.g. add 5/10/15min up to some policy-defined limit. Perhaps these features would be relatively simple calls to the endpoints like <base>/cancel and <base>/extend that just dump in the token as bearer token .... although .... extend would require a new token so that you could reconnect if you were dropped (could just be returned as a payload that is chucked in the store? - but what if you then reload the page after original expiration but before extended expiration - you'd have to intercept the reload, and change the url or provide some other way to access that extended booking?). So some of that suggests keeping these features in the booking app - except how do you update the token for a UI you are already running in a separate window to the booking system? At this point ... a session becomes pretty handy.

Do we want to consider a live websocket connection to a management system, so that we can receive cancellation/extension requests, send updated tokens, and communicate messages to specific users, or all users?

Ideally perhaps a session system, with a separate message system. Then there is no need to update the opaque session code in the app - it is handled according to the latest status of that session code when the app is refreshed.

The message system could stay open for longer than the experiment, e.g. to allow survey submission or activity submission (e.g. some sort of marking or attendance system).

With cancellation, we would need to alert the relay to drop any existing connections using that token, and not allow any token that is now denied. We'd have to make sure we did not inadvertently cancel the long-lasting experimental connections - they have different tokens though -much longer expiration. Is that different enough or do we need to add a field to the tokens for the data tokens where the characteristics are otherwise the same? Or can we just deny list specific tokens i.e. an exact expiration date (e.g. could use hash of the token, base64 string representing the main body of the token, because we would have already validated any tokens we wanted to use in this way, so can drop the header and signature)

practable / relay