tursodatabase / libsql

libSQL is a fork of SQLite that is both Open Source, and Open Contributions.
https://turso.tech/libsql
MIT License
9.54k stars 252 forks source link

replace parking_lot with tokio::sync::Mutex in some places #1668

Closed sivukhin closed 1 month ago

sivukhin commented 1 month ago

Context

Current sqld code abuses tokio runtime which can lead to complete deadlock of the server.

One issue which got triggered pretty frequently relying on the following facts about sqld internals:

  1. Many async request handlers in sqld checks that metadata has namespace in it with sync lock metadata.inner.configs (see MetaStore::exists method)
  2. Metastore remove operation take metadata.inner.configs lock first and then try to take inner.conn lock
  3. Background checkpoint operation which run on blocking thread take inner.conn lock for the time of the checkpoint process

So, if (2) took metadata.inner.configs lock while (3) is in process, then all request tasks which hits (1) will be blocked on the parking_lot mutex until (3) released the lock and (2) finished. This can easily block all tokio worker threads and lead to complete server deadlock.

This PR mitigate this specific scenario by introducing 2 fixes:

  1. MetaStoreInner.configs and MetaStoreInner.conn now uses tokio::sync::Mutex instead of parking_lot::Mutex. This will prevent scenario from above as async tasks now will be blocked on async lock and runtime will be able to switch them from worker threads and put some other workload on them
  2. Metastore remove operation now take inner.conn lock first and then take inner.configs lock. This will prevent the cases where inner.configs lock is taken for too long by remove operation while awaiting next inner.conn lock. As we are using configs lock in both async & sync context and also this lock required for quick checks in metastore - it's better to not hold it for too long.
haaawk commented 1 month ago

Why is this a draft @sivukhin ?