Consul Data Directory spikes from ~70% to ~100% disk usage

vickysy84 commented 2 years ago

Overview of the Issue

We use Consul as backend storage for Vault HA setup. The data directory suddenly spiked from ~70% to 100% data usage for 2 of the 3 consul nodes even when we try to clean up the space and restart the Consul services.

Reproduction Steps

Steps to reproduce this issue, eg:

Create a Vault and Consul cluster with 2 Vault servers and 3 Consul servers.
Contains ~200 KV pairs

Operating system and Environment details

RHEL 7, 5GB space for data-dir

Amier3 commented 2 years ago

Hey @vickysy84

Sorry to hear you're having this issue, I have a couple of questions that may help us figure this out. If we dig into this and it happens to be a vault issue, i'll go ahead and transfer the issue so you don't have to make a new one.

So I have a couple of questions:

Is the data spike associated with any specific activity in vault ( or maybe other services on the rhel machine ) ? or just completely random?
What version of consul is your backend running on?
If you run grep 'Out of memory' /var/log/messages , do you see any out of memory errors from consul?
What's the total size of the secrets in the backend? You can find this by running consul kv export vault/logical/ > vault_secrets.json to export your kv data then ls -lh vault_secrets.json

I think that'd be a good start to understand what's going on, but i'd also reccomend checking out our guides on Inspecting data in Consul storage and Performance Tuning Vault ( which contains sections on linux specific steps and steps for the consul backend )

maxb commented 2 years ago

Hi @vickysy84 ,

It would be helpful if you could show the contents of your Consul data directory on a problem node, with file names and sizes - e.g. something like this:

maxb@q:~$ tree -h /var/lib/consul/data/
/var/lib/consul/data/
├── [  48]  acl-tokens.json
├── [ 394]  checkpoint-signature
├── [  36]  node-id
├── [4.0K]  raft
│   ├── [2.3K]  peers.info
│   ├── [ 24M]  raft.db
│   └── [4.0K]  snapshots
│       ├── [4.0K]  2-16536-1650095072556
│       │   ├── [ 283]  meta.json
│       │   └── [1.1M]  state.bin
│       └── [4.0K]  3-33531-1650096955978
│           ├── [ 283]  meta.json
│           └── [2.5M]  state.bin
└── [4.0K]  serf
    ├── [ 100]  local.snapshot
    └── [  78]  remote.snapshot

5 directories, 11 files

(ls -lhR /var/lib/consul/data would also work, though less readably.)

In absence of some specific data about your environment, I'll say a few things about how Consul manages data in general:

Consul is primarily an in memory data store. The working version of your entire data set is kept in RAM.

The data on disk is composed of full snapshots of previous versions of the entire data set (raft/snapshots/*) and a record of changes since the last snapshot (raft/raft.db).

Consul is hardcoded to retain 2 full snapshots on disk. In addition, it needs space to write out another full snapshot before deleting an old one. So, your data directory needs to be capable of storing 3 times the size of one complete directory under raft/snapshots/, plus a bit more for the rest of the working files.

Whilst a new snapshot directory is being written, it will have a .tmp suffix - e.g. 3-33531-1650096955978.tmp.

Consul has a cascading failure mode, where, if it is repeatedly interrupted whilst trying to write a snapshot, it fills up its raft/snapshots/ directory with lots of .tmp-suffixed directories, which it never automatically cleans up. In normal operation there should only be zero or one .tmp-suffixed directory in raft/snapshots/, depending on whether a snapshot is currently being written.

aldy505 commented 1 year ago

Hi, will this issue be fixed? I have 7 node of Consul on VMs with 3 of the nodes being out of space (I created VM that only have 20 GB of storage -- inside it's running Vault and Consul together). I got so many *.tmp directories that's never cleaned up by Consul.

Or at least, if we (as the users) need to clean them manually, what's the safest command that we can execute?

maxb commented 1 year ago

Hi @aldy505 ,

This particular GitHub issue appears to be an unconfirmed user report, in which the original reporter has never responded to requests for additional information. Therefore, I don't think it's even possible to know what the issue is, let alone consider fixing it.

On the other hand, if you're looking for comments about what I said:

Consul has a cascading failure mode, where, if it is repeatedly interrupted whilst trying to write a snapshot, it fills up its raft/snapshots/ directory with lots of .tmp-suffixed directories, which it never automatically cleans up.

then you are probably better off creating a new issue which is solely and clearly about that.

Personally, I (a community member only) have no idea whether HashiCorp have that on their roadmap.

I would recommend anyone running Vault with Consul storage these days to seriously consider migrating to Vault's built-in Raft storage, and eliminating Consul from the infrastructure. The migration is not simple, but eliminating Consul as a dependency of your Vault infrastructure is quite a payoff, and it's clearly the direction HashiCorp seem to be throwing most support behind long term.

Consul only ever writes to one .tmp-suffix snapshot directory at a time. Therefore you can know it is safe to delete them if any newer snapshot directory - either .tmp-suffixed, or complete, exists. I would implement a cron job scanning for such directories and deleting them on any production Consul cluster.

maxb commented 1 year ago

Oh, and the Discourse discussion board at https://discuss.hashicorp.com is a good place for asking for advice about HashiCorp product operations, when the questions don't fit as direct bug reports.

aldy505 commented 1 year ago

Hi @maxb, thanks for the reply.

This particular GitHub issue appears to be an unconfirmed user report, in which the original reporter has never responded to requests for additional information. Therefore, I don't think it's even possible to know what the issue is, let alone consider fixing it.

I'll consider making a separate issue to tackle *.tmp file cleanup by Consul. But considering what you said here...

I would recommend anyone running Vault with Consul storage these days to seriously consider migrating to Vault's built-in Raft storage, and eliminating Consul from the infrastructure. The migration is not simple, but eliminating Consul as a dependency of your Vault infrastructure is quite a payoff, and it's clearly the direction HashiCorp seem to be throwing most support behind long term.

I'll do some research to migrate from Consul as storage backend. Thanks for the tip.

hashicorp / consul