Closed mxsasha closed 8 months ago
Did some digging into the management command: it creates a BatchUser object with some user data. This is linked to BatchRequest objects in the DB, and used to limit access and load balance over all users (rather than all tasks). A decorator looks up the user from the request meta and passes it to views. Other user data, like organisation, is not used.
Plan: remove all user metadata columns except for username, as it is not needed there and not authoritative already. Skip the user creation process, and when receiving a request not matching an existing BatchUser, create one in the database. That way, the only actual source of user data becomes the htaccess file, and the database just follows automatically.
Our current user management, for batch, monitoring and any access (as used in dev), is based on a set of environment variables. Changing these requires a restart of the webserver. There is also some risk of accidentally changing another setting or user. This is a blocker for batch production, as that has more frequent changes to users.
Currently, the web server docker uses this script to generate a htpasswd file, then this to add appropriate nginx settings which are all copied into the image and are run as part of the nginx image entrypoint (i.e., on container start).
First idea: move to three htpasswd files to a docker volume mounted in the webserver. Allows persistence and management with normal tools. Nginx entrypoint could base config on whether or not the file exists at startup, or an env variable - allowing it to error loudly if an expected passwd file is not present. Will need to require an nginx reload.
Note: for batch a user add also involves calling a management command to add an entry to the SQL db. We could do this through docker exec. We should do an extra check whether this table is ever used.
We've had discussions on a bigger rehash of user management, as this has too many files and users and hashes spread over too many places - with no solution in mind yet. However, this issue is a blocker for batch docker deploy, which saves us from having two entirely separate infrastructures, so we should not block it long on finding a perfect solution.
We should also update our internal docs.
1274 is not in this scope but closely related.