pingcap / tiflash

The analytical engine for TiDB and TiDB Cloud. Try free: https://tidbcloud.com/free-trial
https://docs.pingcap.com/tidb/stable/tiflash-overview
Apache License 2.0
941 stars 410 forks source link

AlmostFull is not accurate #6415

Open CalvinNeo opened 1 year ago

CalvinNeo commented 1 year ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

In init_storage_stats_task, we should use TiFlash's config.

...
        self.background_worker
            .spawn_interval_task(DEFAULT_STORAGE_STATS_INTERVAL, move || {
                let disk_stats = match fs2::statvfs(&store_path) {
                    Err(e) => {
                        error!(
                            "get disk stat for kv store failed";
                            "kv path" => store_path.to_str(),
                            "err" => ?e
                        );
                        return;
                    }
                    Ok(stats) => stats,
                };
...

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiFlash version? (Required)

CalvinNeo commented 1 year ago

IMO, the check is to protect local disk rather than the whole storage. Since TiFlash uses multi disk storages, a low-space-ratio of 0.8 is too strict, so we can simply raise low-space-ratio to solve this.

JaySon-Huang commented 1 year ago

After some testing, low-space-ratio only affects the region schedule (like add-peer) on PD.

While the threshold for almost full, DISK_RESERVED_SPACE, is calculated from min(raftstore.capacity, disk.total_space) * 0.05. If the disk available space is less than DISK_RESERVED_SPACE, we will see that proxy won't write blocks into tiflash storage layer.

JaySon-Huang commented 1 year ago

Setting capacity in TiFlash is usually done by setting storage.main.capacity list, without setting the tiflash-proxy raftstore.capacity. It would be better we can override the tiflash-proxy raftstore.capacity config item by the first elem of storage.latest.capacity (which is proxy data stored in and the capacity is inferred by storage.main.capacity)

CalvinNeo commented 1 year ago

There are 3 ways to do this:

  1. Disable this service
  2. Use a observer to retrieve disk capacity from TiFlash
  3. Use a command argument