hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.39k stars 4.43k forks source link

Consul KV Fingerprinting #5549

Open alkalinecoffee opened 5 years ago

alkalinecoffee commented 5 years ago

Feature Description

When querying the KV store, Consul should return a signature/fingerprint/md5/etc of some sort for the path. For individual values, the fingerprint of the value should be returned. If a folder is queried, some sort of signature that accounts for all keys/values under that folder should be returned.

$ echo hello | md5
b1946ac92492d2347c6235b4d2611184

$ consul kv get /my/key
hello

$ consul kv get -detailed /my/key
MD5              b1946ac92492d2347c6235b4d2611184
Key              /my/key
...

Use Case(s)

This functionality is useful when diffing key value paths between datacenters where KV replication is not performed reliably or not at all. In our case, we have multiple datacenters that are not WAN-joined together (as migrations to newer clusters are actively performed). We still would like to periodically replicate KVs across non-joined datacenters until the migration is fully complete. It seems that for this to happen, we need to export and import keys across datacenters. Traversing the entire tree for each export/import operation seems overkill.

Being able to check the fingerprint of a particular path allows us to quickly opt-out of the replication action if we find the KV tree signatures match between datacenters. Otherwise, we must traverse the entire KV tree and import keys individually.

This could also be used to selectively backup a KV store by first checking if the KVs have been updated before performing the backup operation.

pierresouchay commented 5 years ago

@alkalinecoffee I thought about this as well, but it is a bit complex actually: because KV is not only about the data but also about locks/semaphores. I am sure it will be doable but could be complex.

In the meantime, you might use indexes to have a similar result, by storing each index of the path once saved, you can then compare it to ensure the value has not been modified.

Consider this:

$ consul kv put my/backup/path/value1 v1
$ consul kv put my/backup/path/value2 v2

At that time, you have this:

curl localhost:8500/v1/kv/my/backup/path?recurse
[
    {
        "LockIndex": 0,
        "Key": "my/backup/path/value1",
        "Flags": 0,
        "Value": "djE=",
        "CreateIndex": 13,
        "ModifyIndex": 13
    },
    {
        "LockIndex": 0,
        "Key": "my/backup/path/value2",
        "Flags": 0,
        "Value": "djI=",
        "CreateIndex": 14,
        "ModifyIndex": 14
    }
]

If you rewrite the value value2:

consul kv put my/backup/path/value2 v3

You'll end up with:

curl localhost:8500/v1/kv/my/backup/path?recurse
[
    {
        "LockIndex": 0,
        "Key": "my/backup/path/value1",
        "Flags": 0,
        "Value": "djE=",
        "CreateIndex": 13,
        "ModifyIndex": 13
    },
    {
        "LockIndex": 0,
        "Key": "my/backup/path/value2",
        "Flags": 0,
        "Value": "djM=",
        "CreateIndex": 14,
        "ModifyIndex": 18
    }
]

So, by comparing the Modify index, you know exactly what key to update.

The downside of this is that you have to store the ModifyIndex of each entry after you backup and this ModifyIndex is specific per DC, but you can retrieve all values at once easily:

curl localhost:8500/v1/kv/my/backup?recurse|jq '.[]|{key:.Key, index: .ModifyIndex}'
pierresouchay commented 4 years ago

Might be a duplicate of #4142