yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.87k stars 1.05k forks source link

[DocDB] Consider using lower-io priority for background operations #12284

Open amitanandaiyer opened 2 years ago

amitanandaiyer commented 2 years ago

Jira Link: DB-567

Description

Flushes, compactions and RemoteBootstrap etc may write lots of data causing the available IO for online operations to be throttled.

We could use IO priority for threads to alleviate this problem.

SYS_ioprio_set allows the thread to change the i/o priority.

I see it used in a couple of places in rocksdb code. Perhaps it could be used in other places too?

00:53 ~/code/yugabyte-db [test_again] $ git grep LowerIOPriority src/yb/
src/yb/rocksdb/util/env_posix.cc:    thread_pools_[pool].LowerIOPriority();
src/yb/rocksdb/util/thread_posix.cc:void ThreadPool::LowerIOPriority() {
src/yb/rocksdb/util/thread_posix.h:  void LowerIOPriority();
00:53 ~/code/yugabyte-db [test_again] $ git grep LowerThreadPoolIOPriority src/yb/
src/yb/rocksdb/env.h:  virtual void LowerThreadPoolIOPriority(Priority pool = LOW) {}
src/yb/rocksdb/env.h:  void LowerThreadPoolIOPriority(Priority pool = LOW) override {
src/yb/rocksdb/env.h:    target_->LowerThreadPoolIOPriority(pool);
src/yb/rocksdb/tools/db_bench_tool.cc:      FLAGS_env->LowerThreadPoolIOPriority(Env::LOW);
src/yb/rocksdb/tools/db_bench_tool.cc:      FLAGS_env->LowerThreadPoolIOPriority(Env::HIGH);
src/yb/rocksdb/util/env_posix.cc:  void LowerThreadPoolIOPriority(Priority pool = LOW) override {
fritshoogland-yugabyte commented 2 years ago

IO prioritisation using sys_ioprio_set is a linux specific optimization.

I first researched for which schedulers IO priorities apply; the linux kernel documentation quite strictly talks about CFQ (completely fair) IO scheduler. In the linux 4.18 tag (alma 8), the following IO schedulers exist: BFQ, Kyber, none, mq-deadline. The BFQ and mq-deadline schedulers do mention IO priorities in the source code, kyber and none do not. The mq-deadline scheduler is default on my alma8 build VM, the centos 7 scheduler default for our cloud builds is none.

This would mean that because the IO scheduler for our current (centos 7) builds is none, setting priorities will not change anything for centos 7, unless a scheduler is set that would respond to priorities.

BFQ (+CFQ) seems to be a scheduler that tries to make interactive IO perform better by changing their priorities equal to batch like IOs. It is explicitly mentioned in the documentation that writes have the tendency to be batching (which we do too with actions like fsync, but also expiration dirtied buffers of batches of writes). To me this seems to balance/de-prioritize batched IO, which to me seems like more interfering with writes and thus batch like processing.

The deadline scheduler is focussed on read requests, as per the kernel documentation & source code. Although I've seen lots of reports that deadline allows better rates for servers than CFQ.

Before trying to optimise priorities, it's probably better to first see if setting schedulers does change anything, and experiment with setting IO priorities to see if io priorization will bring anything workable. At the end of the day a submitted IO must be served, I cannot tell if it will make a substantial change making it worth the effort to implement.