yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.96k stars 1.07k forks source link

[DocDB] Frequent Paused Compactions and High CPU Usage on one node; Observed many SST Files Not Compacted #23757

Open shamanthchandra-yb opened 1 month ago

shamanthchandra-yb commented 1 month ago

Jira Link: DB-12659

Description

A potential bug has been observed in the YugabyteDB cluster where SST files are not being compacted as expected, and node n2 is experiencing frequent paused compactions along with high CPU usage. Please find slack thread in JIRA description.

Setup Details:

Configuration:

Observations:

  1. Node n2 (who was also a master leader) showed high CPU usage, reaching up to 97%, with compaction threads piling up. Disk utilisation was around 50% on each disk (confirmed via iostat), while other nodes maintained around 75% CPU usage.
  2. We suspected a bug in the priority pool. We should certainly see why we mark tasks as non active and never review them.
  3. Need to find RCA for why there are so many paused compactions. Latest update from slack, by @arybochkin :
We restarted n2 to get the fresh set of logs. It helped with number of SST files — all old files have been deleted by ‘universal deletion compaction’, which again confirms the files had been hold by some paused compactions, and that lead to a huge reduction of SST files and write rejections stoped as expected. I’m monitoring the situation to get more pointers from the fresh logs, seems the situation is going to repeat, as I already see a couple of dozens of paused compactions.

image (1)

image

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

rthallamko3 commented 3 weeks ago

Per @ttyusupov "Compactions tasks priorities are frequently changed because they are based on number of SST files in current RocksDB state. And that is causing pausing and transferring control to tasks with higher priority again and again. That was increasing the effect of going out of stable state, because that makes node to work slowly on almost all 150-250 background compactions switching between them instead of completing them one by one."

The following tserver gflags changes helped as it avoided frequent priority changes to compactions tasks and therefore pausing/resuming of compactions.

compaction_priority_step_size=10
compaction_priority_start_bound=20