libsql / sqld

LibSQL with extended capabilities like HTTP protocol, replication, and more.
https://libsql.org
903 stars 38 forks source link

xCheckpoint: move blocking ops to use bottomless context #702

Closed Horusiath closed 1 year ago

Horusiath commented 1 year ago

Some context - we started observing panics on checkpoint calls with following trace:

2023-09-26T07:41:16.425109Z TRACE sqld: database checkpoint
2023-09-26T07:41:16.425266Z TRACE xCheckpoint{emode=3 busy_handler=Some(0x103e7e8d0) busy_arg=0x600003728000 sync_flags=10 n_buf=4096 z_buf=0x7ff12d04ac00 frames_in_wal=0x70000f7ee304 backfilled_frames=0x70000f7ee308}: sqld::replication::primary::logger: bottomless checkpoint
2023-09-26T07:41:16.425462Z ERROR xCheckpoint{emode=3 busy_handler=Some(0x103e7e8d0) busy_arg=0x600003728000 sync_flags=10 n_buf=4096 z_buf=0x7ff12d04ac00 frames_in_wal=0x70000f7ee304 backfilled_frames=0x70000f7ee308}: tracing_panic: A panic occurred panic.payload="there is no reactor running, must be called from the context of a Tokio 1.x runtime" panic.location="sqld/src/replication/primary/logger.rs:229:27"
2023-09-26T07:41:16.425636Z ERROR xCheckpoint{emode=3 busy_handler=Some(0x103e7e8d0) busy_arg=0x600003728000 sync_flags=10 n_buf=4096 z_buf=0x7ff12d04ac00 frames_in_wal=0x70000f7ee304 backfilled_frames=0x70000f7ee308}: sqld_libsql_bindings::wal_hook: panic in call to xframe: there is no reactor running, must be called from the context of a Tokio 1.x runtime:
   0: std::backtrace::Backtrace::create
   1: std::backtrace::Backtrace::force_capture
   2: sqld_libsql_bindings::wal_hook::xCheckpoint
   3: _sqlite3BtreeCheckpoint
   4: _sqlite3VdbeExec
   5: _sqlite3_step
   6: rusqlite::row::Rows::get_expected_row
   7: core::ops::function::FnOnce::call_once{{vtable.shim}}
   8: std::sys_common::backtrace::__rust_begin_short_backtrace
   9: core::ops::function::FnOnce::call_once{{vtable.shim}}
  10: std::sys::unix::thread::Thread::new::thread_start
  11: __pthread_start

2023-09-26T07:41:16.425968Z  WARN sqld: failed to execute checkpoint: Internal Error: `Failed to receive response via oneshot channel: channel closed

It turns out that a current tokio runtime context doesn't exists when xCheckpoint is called. To fix it we pull it from bottomless hook.

Second fix is related to LibsqlConnection which was using OS thread spawn instead of tokio spawn.