tursodatabase / libsql

libSQL is a fork of SQLite that is both Open Source, and Open Contributions.
https://turso.tech/libsql
MIT License
9.56k stars 253 forks source link

Proposal: On interactive connection, timeouts, and connection conccurency #925

Open MarinPostma opened 8 months ago

MarinPostma commented 8 months ago

I propose changing how the hrana protocol handles interactive connections.

How it currently works

Hrana-over-HTTP has become the de facto protocol to interact with sqld. When a request is received by hrana, a Hrana Stream is created. This Stream owns an SQLite connection to execute queries sent to that stream. A Stream is identified with a baton. When a stream is opened, the baton for that stream is returned to the client. The client can pass the baton to subsequent requests so that they are executed on the stream identified by that baton. When clients want to dispose of that stream, they emit a Close command on that stream, which is closed, freeing the SQLite connection in the process. To avoid keeping streams open forever, a watchdog is associated with each stream if a client fails to complete it. If the watchdog fails to be kicked (by receiving a request on that stream), then, after a defined period (10s currently), the stream is expired, and the SQLite connection is released.

Transactions also have a timeout mechanism that is separate from the stream. Transactions are given a (configurable, default to 5s) timeslot to complete. Failing to do so yields a timeout error. The lock-stealer ensures that the transaction lock is delivered to another waiting connection, if any.

Motivation

This change is motivated by multiple factors:

Solution

1) Default to non-interactive transaction: On the hrana side, only return keep a stream open if the connection is not in an auto-commit state (i.e., the connection is currently in a transaction state). In other words, if the connection is not in a transaction state after executing a query, the default behavior is to close it. In that case, no baton is returned, and the resources are released. On subsequent queries, a new connection will be opened. This solves the following issues:

It may be desired cases. I'll use interactive connections outside of a transaction, but it would be an opt-in rather than an opt-out as it is today.

2) Hard and soft transaction limits The current transaction timeout is a hard timeout. It is enforced in two ways:

We should increase the timeout to let transactions run longer if there is no contention. We will still need to limit transactions so as not to prevent checkpoints, but we can also let them run for as long as a stream is open if the connection is uncontended.

3) **Kick the watchdog*** In the CLI scenario, keeping a stream open for an extended period may be desirable. We can allow that by having the client periodically kick the stream watchdog. This may require exposing a Noop operation in the hrana protocol, whose sole purpose is to keep the stream open.

psarna commented 8 months ago

Ah, I was sure we already implement a soft limit for transactions, to let them last longer if there's no contention. So that's something we should definitely add.

And as for checkpoints, we can make them play the same stealing game - transactions only have a soft timeout, but once a checkpoint is about to be triggered, it tries to steal the lock, and potentially roll back a transaction that already passed the soft timeout.

haaawk commented 8 months ago

I think this is a good proposal and definitely a step into the right direction.

One caveat is that closing the sqlite connection after every request will increase the latency of the next request on the same connection. This can be solved though by not releasing the connection eagerly and only evict it from connections cache when we're short on resources.

MarinPostma commented 8 months ago

I think the cost of re-opening a connection is completely absorbed by network latencies, but we'd need to measure that. There are other levers to improve connection creation latency on the long run as well

MarinPostma commented 8 months ago

Issue with lazily closing is that if we return a baton, then it is assumed that it can be used withing the limits. Long lasting connections should be the exception, to reduces contention on the concurrency semaphore

billythedummy commented 5 months ago

Transactions are given a (configurable, default to 5s) timeslot to complete

Apologies for the sidetrack but how can users configure this timeout duration on turso or in libsql? We currently have a use-case where we're trying to write large amounts of data in a single transaction and it's constantly timing out.

MarinPostma commented 5 months ago

hey @billythedummy, currently this option is not exposed in turso. But there is a workaround. Say you have a main table CREATE TABLE emails (addr) to which you want to bulk import transactionally. What you can do is create a CREATE TABLE emails_tmp (addr), to which you import in batch. Once all the data is loaded in the temp table, you can do INSERT INTO emails SELECT * FROM emails_tmp. That will copy all the data over to the main table, in a single transaction.