opensafely-core / opencodelists

OpenCodelists is an open platform for creating and sharing codelists of clinical terms and drugs.
https://www.opencodelists.org
Other
31 stars 11 forks source link

Operational error: database is locked #1839

Open iaindillingham opened 7 months ago

iaindillingham commented 7 months ago

We see operational errors regularly (56 errors and 19 users in the last 30 days). Mostly, they relate to the codelist builder (#1811) but also to the user_create_codelist view. Whilst one user routinely reports these errors to tech support (thanks, @emprestige!), clearly several users are affected.

It's very hard to recreate these errors locally, and it may be the case that no one view is responsible. When discussing #1811, it was mentioned that someone (possibly @rebkwok?) has thought about what replacing SQLite with PostgreSQL would entail.

rebkwok commented 7 months ago

I have not thought in much detail about what replacing SQLite with Postgres would look like. Theoretically, replacing the main db should be pretty straightforward, and job-server can be used as a template, infrastructure-wise. This would likely solve the database-locking issues (at least I'd hope it would), because my assumption is that those are all related to the main db.

The issue is what to do about the coding system databases. SQLite is an easy option with the current setup, because it's trivial to create a new sqlite db for each coding system import. There may be a better solution to maintaining the historical releases. Django can certainly have multiple databases of different types, so switching to Postgres doesn't mean we can't also use sqlite dbs. If we were to keep the current solution, it may be possible to have the default db be postgres, and the coding system dbs continue to be sqlite dbs as they currently are. However, I've never done that before in a django project, so I'm not sure what pitfalls there might be when the main (postgres) db needs to talk to a coding system db (although, this shouldn't happen often - the bnfdmd mapping may be the only place still left - have a look at the database router in opencodelists/db_utils.py).

inglesp commented 7 months ago

Seagull warning... rather than changing the database, another approach would be to reduce the write load by not updating the database on every interaction with the builder. This would mean that users would have to explicitly save their changes.

iaindillingham commented 7 months ago

Thanks for your comments, @rebkwok and @inglesp. We discussed this issue during this morning's team-rex meeting and agreed that the first task would be to scope out a couple of candidate solutions.