jhu-bids / TermHub

Web app and CLI tools for working with biomedical terminologies. https://github.com/orgs/jhu-bids/projects/9/views/7
https://bit.ly/termhub
GNU General Public License v3.0
10 stars 10 forks source link

DB: Connection performance (pooling, async) #479

Open joeflack4 opened 1 year ago

joeflack4 commented 1 year ago

Overview

We want to increase stability and performance of DB connections.

Async

Siggie 2023/12/14:

chatgpt recommends using asynchronous connection, which we are not doing now. that would be a bit of work, but probably not terrible

Connection pooling

Details

Pre 2023/12/14: Siggie knows the details of how to do. 2023/12/14: We've just learned that SqlAlchemy automatically does? ~Alternative: New connection for every time route is called~ - ~Replace the global CON object at the top of `backend/routes/db.py`~

Additional info

Details

### Original "Overview" of issue The most pressing issue that we noticed where this is needed is that when the DB runs out of memory, it kills current connections, we think. At the very least, the servers can't connect properly to the DB. In order to fix that. ### An error that causes connection to be lost (solved I think) Shows the error message that happened; "need to rollback transaction": ![MicrosoftTeams-image](https://github.com/jhu-bids/TermHub/assets/13045020/04738e5d-d90d-4e82-a4ce-c7da82d23a8d) _Update 2023/12/14_: If I remember correctly, this was caused by disk space maxing out, and we have a solution / standard procedure in `developer.md`.

joeflack4 commented 1 year ago

@Sigfried Is this now a duplicate of #218?

joeflack4 commented 1 year ago

Whatever solution we implement should be able to recover from these kinds of issues (which for now were fixed by #541)

Do we need to do something like a try/except? Or is any sophisticated pooling just going to discard a bad connection thread on its own?

@Sigfried What is the primary use case of persisting a connection (w/ pooling)? It's just for performance reasons, right?

Sigfried commented 1 year ago

@Sigfried Is this now a duplicate of #218?

Yeah. I closed that one and here's its contents:

When the postgres connection is lost, the backend doesn't reconnect. This kind of app should be using a connection pool. How to test: Run query on backend: capture CON variable for our global connection, close it, and run a new query and it should fail. And then set up the connection pooling so that it initializes a new connection on request.

Sigfried commented 1 year ago

Do we need to do something like a try/except? Or is any sophisticated pooling just going to discard a bad connection thread on its own?

Not sure I understand.

@Sigfried What is the primary use case of persisting a connection (w/ pooling)? It's just for performance reasons, right?

Sort of. I suspect creating a new connection for every function call could be costly beyond the time it takes to establish the connection. If too many are opened at the same time or some connections remain open for any reason, that could eventually crash the server.