influxdb suddenly stops compacting a shard

Steps to reproduce:

restore a backup of the problematic shard

Expected behavior: This database is sharded by day and has about 200GB of data per day. There isn't more points/tags/measurements in this problematic shard so I would expect it to be around 200GB.

Actual behavior: This shard stays at 580GB and doesn't shrink even after being cold for days. Usually we have a bit more than 100 tsm files in a shard. But for this day there are more than 12k tsm, most of which are at compaction level 1. The biggest tsm file group stays at level 2 (with 82 tsm files and 174GB in total) for days.

Environment info:

System info: Linux 4.14.111-1.el7.centos.x86_64 x86_64
InfluxDB version: InfluxDB v1.8.0 (git: 1.8 781490de48220d7695a05c29e5a36f550a4568f5)
Other relevant environment details: HP DL380 Gen10 stuffed with SSDs and 188GB ram

Config:

[data]
  dir = "/data/influx/data"
  wal-dir = "/data/influx/wal"
  index-version = "tsi1"
  query-log-enabled = true
  cache-max-memory-size = "5g"
  max-concurrent-compactions = 5
  compact-throughput = "200m"
  compact-throughput-burst = "400m"

This problematic shard is from January 19th. The amount of data was growing at a steady rate until 17:30, after which the diskBytes went skyrocketing. As I mentioned above, this database is sharded by day. Here is a chart of diskBytes from the Jan 18th shard (id 676) and the 19th (id 679).

After 17:30, there wasn't any (new) Level1 to Full compaction happening to the Jan 19th shard. compactions

During the days after I do see in influxdb logs that it tried to compact this shard, but the biggest group never got higher than level 2. The whole influxdb server became really slow due to queries on this shard, so I have dropped it on the production database. Here is the content of a restored backup of this day. https://gist.github.com/phill84/6b795531b91625fdeacf4be880833eb7

influxdata / influxdb

influxdb suddenly stops compacting a shard #20627