Closed spacejam closed 6 years ago
Contention-Aware Lock Scheduling for Transactional Databases
we present the concept of contention-aware scheduling,
show the hardness of the problem, and propose novel lock
scheduling algorithms (LDSF and bLDSF), which guarantee
a constant factor approximation of the best scheduling. We
conduct extensive experiments using a popular database on
both TPC-C and a microbenchmark. Compared to FIFO—
the default scheduler in most database systems—our bLDSF
algorithm yields up to 300x speedup in overall transaction
latency. On the other hand, our LDSF algorithm, which
is simpler and achieves comparable performance to bLDSF,
has already been adopted by open-source community, and
chosen as default scheduling strategy in MySQL 8.0.3+.
This nice slidedeck from Andy Pavlo covers the high-level implementations of hekaton, hyper, and cicada.
Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores (2014)
o better understand just how unprepared current DBMSs are for
future CPU architectures, we performed an evaluation of concur-
rency control for on-line transaction processing (OLTP) workloads
on many-core chips. We implemented seven concurrency control
algorithms on a main-memory DBMS and using computer simula-
tions scaled our system to 1024 cores. Our analysis shows that all
algorithms fail to scale to this magnitude but for different reasons.
In each case, we identify fundamental bottlenecks that are indepen-
dent of the particular database implementation and argue that even
state-of-the-art DBMSs suffer from these limitations. We conclude
that rather than pursuing incremental solutions, many-core chips
may require a completely redesigned DBMS architecture that is
built from ground up and is tightly coupled with the hardware.
Another cool Pavlo deck about the work from "staring into the abyss"
Cicada: Dependably Fast Multi-Core In-Memory Transactions
Cicada is a single-node multi-core in-memory transactional data-
base with serializability. To provide high performance under diverse
workloads, Cicada reduces overhead and contention at several levels
of the system by leveraging optimistic and multi-version concur-
rency control schemes and multiple loosely synchronized clocks
while mitigating their drawbacks. On the TPC-C and YCSB bench-
marks, Cicada outperforms Silo, TicToc, FOEDUS, MOCC, two-
phase locking, Hekaton, and ERMIA in most scenarios, achieving
up to 3X higher throughput than the next fastest design.
Seems like the current contender.
I'm aiming toward a cicada-like architecture, but with simplified timestamp generation initially just using an atomic fetch_add
. We can also support causal consistency with zero centralized timestamp contention by just using a higher timestamp than what we find in our initial reads and tracking a thread-local max timestamp encountered, plus a per-thread identifier in the low bits for guaranteeing timestamp uniqueness. We have a LOT of tuning to go before this becomes a bottleneck.
Time for a literature shootout to determine the initial architecture for sled 0.16's transactions and snapshots! If anyone has particular insights or opinions on these, please jump in here!
There are a number of separate concerns when MVCC is mentioned, that we should be careful not to conflate (taken from the paper "An Empirical Evaluation of In-Memory Multi-Version Concurrency Control" below):
The ideas in silo and tictoc particularly appeal to me due to the focus on reducing scalability barriers at high core counts, where we'll be headed in the coming years. A concern that will have a major impact on the implementation is how snapshots are handled. Sled is optimized for point reads and short-medium scans. It is acceptable for long scans to pay a higher GC penalty, especially if it simplifies implementation complexity. We value reliable worst case latency far more than crazy p0th-p50.
High-Performance Concurrency Control Mechanisms for Main-Memory Databases (2012)
Speedy Transactions in Multicore In-Memory Databases (2013) (silo)
Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems (2015)
High performance transactions via early write visibility (2017)
delayed write visibility stems from the fact that database systems can arbitrarily abort transactions at any point during their execution. Accordingly, we make the case for database systems which only abort transactions under a restricted set of conditions, thereby enabling a new recoverability mechanism, early write visibility, which safely makes transactions' writes visible prior to the end of their execution. We design a new serializable concurrency control protocol, piece-wise visibility (PWV), with the explicit goal of enabling early write visibility. We evaluate PWV against state-of-the-art serializable protocols and a highly optimized implementation of read committed, and find that PWV can outperform serializable protocols by an order of magnitude and read committed by 3X on high contention workloads.
Efficiently making (almost) any concurrency control mechanism serializable (2017)
An Empirical Evaluation of In-Memory Multi-Version Concurrency Control (2017)
Transaction Healing: Scaling Optimistic Concurrency Control on Multicores (2016)
TicToc: Time Traveling Optimistic Concurrency Control (2016)