jjbuchan / docs

0 stars 0 forks source link

How to resolve stuck Cassandra compactions #10

Open jjbuchan opened 3 years ago

jjbuchan commented 3 years ago

One way to know this is even happening is due to a CPU check alert

What To Do

SSH into the cass node in question and double check its load average, such as

$ uptime
19:21:38 up 64 days, 16:33,  1 user,  load average: 18.78, 18.23, 17.40

To assess if it likely stuck performing compactions, use nodetool's compactionstats, such as

$ /opt/cassandra/bin/nodetool -h localhost -p 9080 compactionstats
pending tasks: 53
          compaction type        keyspace   column family bytes compacted     bytes total  progress
               Compaction             ABC unified_account            2889      7765832586     0.00%

A count of pending tasks greater than zero indicates that more compaction tasks are pending behind the current one. Looking at the current one, note the bytes compacted and the total progress. Also, note the column family since you'll need to grep for that later. Unfortunately, large column families, such as unified_acount, are slow to progress through compaction, so you'll need to re-run thecompactionstats about 30 minutes later to judge any non-progression. (George has seen it take 1.5 hours.)

At the same time, you can use this grafana chart to observe the historical trending of pending compaction stats. A chart that looks like this definitely indicates stuck compaction. Notice how cass2.lon3 dominates the chart on the left portion. The sudden drop near 1/16 15:00 is when the solution in this KB was applied.

image

When convinced that compaction is stuck, you now need to determine which specific files are contributing to the problem, move them away, and repair.

To determine the files likely to be contributing to the problem, grep the cass service logs for the column family that shows the stuck compaction tasks:

egrep "Compacting|Compacted" /var/log/cass/current | \
  grep unified_account | \
  less

Locate the most recent entry (by pressing G) that looks like this and isn't followed by a "Compacted to" log message:

2017-01-15_06:34:24.30781 INFO - Compacting [SSTableReader(path='/var/lib/cassandra/data/ABC/unified_account-hd-55695-Data.db'), SSTableReader(path='/var/lib/cassandra/data/ABC/unified_account-hd-54489-Data.db'), SSTableReader(path='/var/lib/cassandra/data/ABC/unified_account-hd-54804-Data.db'), SSTableReader(path='/var/lib/cassandra/data/ABC/unified_account-hd-55819-Data.db'), SSTableReader(path='/var/lib/cassandra/data/ABC/unified_account-hd-54801-Data.db’)]

NOTE sometimes you'll need to search the log prior to current also using the glob /var/log/cass*.

The *.db files referenced by the path attribute are the suspects, so make a note of that specific list.

Use the procedure in #59 to cleanly shutdown the Cassandra service.

/opt/cassandra/bin/nodetool -h localhost -p 9080 disablegossip \
&& /opt/cassandra/bin/nodetool -h localhost -p 9080 disablethrift \
&& /opt/cassandra/bin/nodetool -h localhost -p 9080 drain
sudo sv stop /etc/sv/cass

Now create a backup directory where the suspicious files will be placed. A good directory name to use is /var/lib/cassandra/BACKUP in order to avoid polluting the .../data directory. With that move the files noted above to the backup directory:

sudo mkdir -p /var/lib/cassandra/BACKUP
sudo mv \
  /var/lib/cassandra/data/ABC/unified_account-hd-{55695,54489,54804,55819,54801}* \
  /var/lib/cassandra/BACKUP

Now start Cassandra and watch the logs to confirm it settles into place

sudo sv start /etc/sv/cass
tail -f /var/log/cass/current

You should also run a nodetool ring on another cass node to confirm our current node is observed with a Status of Up.

When the node is in a good steady-state, initiate a repair operation of the specific keyspace (usually ABC) and the column family observed in the operations above, such as unified_acount. Use of tmux is highly recommended since the repair is a very long running operation.

tmux
/opt/cassandra/bin/nodetool -h localhost -p 9080 repair ABC unified_account

In another window/pane you can watch for signs of the repair's progression by tail'ing /var/log/cass/current.

After the repair command completes, you will want to use the techniques above to monitor the overall compaction progress. You may find the node remains busy with pending compaction tasks for up to a day since a large backlog of files has likely built up. Even though the chart may still look "bad", you should see progression of the compactionstats.