spacejam / sled

the champagne of beta embedded databases
Apache License 2.0
8.2k stars 384 forks source link

MVCC and transactional design shootout :gun: :bomb: :explosionbomb: :extrabigexplosion: #232

Closed spacejam closed 6 years ago

spacejam commented 6 years ago

Time for a literature shootout to determine the initial architecture for sled 0.16's transactions and snapshots! If anyone has particular insights or opinions on these, please jump in here!

There are a number of separate concerns when MVCC is mentioned, that we should be careful not to conflate (taken from the paper "An Empirical Evaluation of In-Memory Multi-Version Concurrency Control" below):

The ideas in silo and tictoc particularly appeal to me due to the focus on reducing scalability barriers at high core counts, where we'll be headed in the coming years. A concern that will have a major impact on the implementation is how snapshots are handled. Sled is optimized for point reads and short-medium scans. It is acceptable for long scans to pay a higher GC penalty, especially if it simplifies implementation complexity. We value reliable worst case latency far more than crazy p0th-p50.

High-Performance Concurrency Control Mechanisms for Main-Memory Databases (2012)

Speedy Transactions in Multicore In-Memory Databases (2013) (silo)

Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems (2015)

High performance transactions via early write visibility (2017)

Efficiently making (almost) any concurrency control mechanism serializable (2017)

An Empirical Evaluation of In-Memory Multi-Version Concurrency Control (2017)

TicToc: Time Traveling Optimistic Concurrency Control (2016)

Previous research has shown that timestamp management is the key
scalability bottleneck in concurrency control algorithms. This pre-
vents the system from scaling to large numbers of cores.
In this paper we present TicToc, a new optimistic concurrency
control algorithm that avoids the scalability and concurrency bot-
tlenecks of prior T/O schemes. TicToc relies on a novel and prov-
ably correct data-driven timestamp management protocol. Instead
of assigning timestamps to transactions, this protocol assigns read
and write timestamps to data items and uses them to lazily com-
pute a valid commit timestamp for each transaction. TicToc re-
moves the need for centralized timestamp allocation, and commits
transactions that would be aborted by conventional T/O schemes.
spacejam commented 6 years ago

Contention-Aware Lock Scheduling for Transactional Databases

spacejam commented 6 years ago

This nice slidedeck from Andy Pavlo covers the high-level implementations of hekaton, hyper, and cicada.

spacejam commented 6 years ago

Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores (2014)

o better understand just how unprepared current DBMSs are for
future CPU architectures, we performed an evaluation of concur-
rency control for on-line transaction processing (OLTP) workloads
on many-core chips.  We implemented seven concurrency control
algorithms on a main-memory DBMS and using computer simula-
tions scaled our system to 1024 cores.  Our analysis shows that all
algorithms fail to scale to this magnitude but for different reasons.
In each case, we identify fundamental bottlenecks that are indepen-
dent of the particular database implementation and argue that even
state-of-the-art DBMSs suffer from these limitations. We conclude
that  rather  than  pursuing  incremental  solutions,  many-core  chips
may  require  a  completely  redesigned  DBMS  architecture  that  is
built from ground up and is tightly coupled with the hardware.

Another cool Pavlo deck about the work from "staring into the abyss"

spacejam commented 6 years ago

Cicada: Dependably Fast Multi-Core In-Memory Transactions

Cicada is a single-node multi-core in-memory transactional data-
base with serializability. To provide high performance under diverse
workloads, Cicada reduces overhead and contention at several levels
of the system by leveraging optimistic and multi-version concur-
rency control schemes and multiple loosely synchronized clocks
while mitigating their drawbacks. On the TPC-C and YCSB bench-
marks, Cicada outperforms Silo, TicToc, FOEDUS, MOCC, two-
phase locking, Hekaton, and ERMIA in most scenarios, achieving
up to 3X higher throughput than the next fastest design.

Seems like the current contender.

spacejam commented 6 years ago

I'm aiming toward a cicada-like architecture, but with simplified timestamp generation initially just using an atomic fetch_add. We can also support causal consistency with zero centralized timestamp contention by just using a higher timestamp than what we find in our initial reads and tracking a thread-local max timestamp encountered, plus a per-thread identifier in the low bits for guaranteeing timestamp uniqueness. We have a LOT of tuning to go before this becomes a bottleneck.