hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.39k stars 4.43k forks source link

Watch handler not being invoked when the value is updated in Consul KV #13428

Open ptry2024 opened 2 years ago

ptry2024 commented 2 years ago

Overview of the Issue

Hey Folks! I'm configuring watch to see if there are changes in Consul KV then watch script handler will being invoked to write the changes into some files. I can see the expected behavior by using consul watch -type=keyprefix ... but when I adding those config into file that use by consul agent to run as service, I noticed watch script handler is invoked once time at starting or restarting consul agent service. This problem didn't happen with consul watch -type=keyprefix ...

Reproduction Steps

Steps to reproduce this issue: With consul watch command

  1. Run consul watch -type=keyprefix -prefix=algorand/ /tmp/consul_watch_handler.sh algorand
  2. Update the value under algorand/ key
  3. Output
    (alpha-hexsafe-aws) root@hx-sg-al-hexsafe-alpha-1:~$ consul watch -type=keyprefix -prefix=algorand/ /etc/consul.d/consul_watch_handler.sh algorand
    [master c56669b] Update kvs.txt
    1 file changed, 3 insertions(+), 3 deletions(-)
    [master f829bd7] Update kvs.txt
    1 file changed, 2 insertions(+), 2 deletions(-)
    [master 617c74c] Update kvs.txt
    1 file changed, 2 insertions(+), 2 deletions(-
    
    (alpha-hexsafe-aws) root@hx-sg-al-hexsafe-alpha-1:/opt/consul/log$ tail -f consul_watch.log 
    algorand/LOG_LEVEL
    Old Value: algorand
    New Value: LOG_DEBUG

Mon 13 Jun 2022 03:42:11 PM UTC Change detected for key: algorand/APPLICATION algorand/LOG_LEVEL Old Value: algorand New Value: LOG_INFO

Mon 13 Jun 2022 03:43:14 PM UTC Change detected for key: algorand/LOG_LEVEL Old Value: LOG_INFO New Value: LOG_DEBUG

Mon 13 Jun 2022 03:44:28 PM UTC Change detected for key: algorand/LOG_LEVEL Old Value: LOG_DEBUG New Value: LOG_INFO


**With configuration file**
1. The `consul.hcl` file is used by consul agent service

(alpha-hexsafe-aws) root@hx-sg-al-hexsafe-alpha-1:/etc/consul.d$ cat consul.hcl bind_addr = "10.xxx" datacenter = "alpha" data_dir = "/opt/consul" encrypt = "xxx" ca_file = "xxx" cert_file = "xxx" key_file = "xxx" verify_incoming = false verify_incoming_rpc = true verify_outgoing = true verify_server_hostname = true recursors = [ "1.1.1.1", "8.8.4.4" ] retry_join = [ "10.xxx" ]

acl = { enabled = true default_policy = "deny" enable_token_persistence = true tokens { master = xxx" agent = "xxx" default = "xxx" } }

watches = [ { type = "keyprefix" prefix = "algorand/" handler_type = "script" args = ["/tmp/consul_watch_handler.sh", "algorand"] } ]

connect { enabled = true }

performance { raft_multiplier = 1 }

addresses { https = "10.xxx grpc = "10.xxx" dns = "10.xxx" }

ports { dns = 53 http = -1 https = 8501 grpc = 8502 }

telemetry { statsd_address = "xxx" }

limits { http_max_conns_per_client = 10000 }

2. Consul agent service

(alpha-hexsafe-aws) root@hx-sg-al-hexsafe-alpha-1:/etc/consul.d$ cat /etc/systemd/system/consul.service [Unit] Description="HashiCorp Consul - A service mesh solution" Documentation=https://www.consul.io/ Requires=network-online.target After=network-online.target ConditionFileNotEmpty=/etc/consul.d/consul.hcl

[Service] Type=notify User=consul Group=consul ExecStart=/usr/local/bin/consul agent -config-dir=/etc/consul.d/ -log-level=debug ExecReload=/bin/kill --signal HUP $MAINPID KillMode=process KillSignal=SIGTERM Restart=on-failure LimitNOFILE=65536

[Install] WantedBy=multi-user.target (alpha-hexsafe-aws) root@hx

3. My watch script handler

!/usr/bin/env sh

export CONSUL_HTTP_SSL=true export CONSUL_HTTP_ADDR=xxx export CONSUL_CACERT=xxx export CONSUL_CLIENT_CERT=xxx export CONSUL_CLIENT_KEY=xxx export CONSUL_HTTP_TOKEN=xxx

if folder does not exist create it

and initialize a git repo for the first run

if [ ! -d "/opt/consul/kvs/${1}/" ]; then mkdir -p /opt/consul/kvs/${1}/ cd /opt/consul/kvs/${1}/ touch kvs.txt consul watch -type keyprefix -prefix=${1} > kvs.txt git init git config user.email "xxx" git config user.name "xxx" git add kvs.txt && git commit -m 'init kvs.txt' fi

if [ ! -d "/opt/consul/log/" ]; then mkdir -p /opt/consul/log/ fi

FLAG=0 cd /opt/consul/kvs/${1}/ touch "$(date)".txt consul watch -type keyprefix -prefix=${1} > kvs.txt git diff --no-ext-diff --quiet --exit-code || FLAG=1

if [ $FLAG = 0 ]; then echo "$(date) : No Change" >> "/opt/consul/log/consul_watch.log" exit 0 fi

if [ $FLAG = 1 ]; then

first we trim quotes and commas then grep for Key and get what it is

    KEY=`git diff | sed -e 's/\"//g' | sed -e 's/\,//g' | grep Key | awk '{ print $2 }'`

    # to get the old and new values we do the same and decode base64 values
    OLD=`git diff | sed -e 's/\"//g' | sed -e 's/\,//g' | grep Value | awk -F'Value: ' '{print $2}' | head -1 | base64 -d`
    NEW=`git diff | sed -e 's/\"//g' | sed -e 's/\,//g' | grep Value | awk -F'Value: ' '{print $2}' | tail -1 | base64 -d`

    if [ "$KEY" != "Key:" ]; then
        echo $(date) >> "/opt/consul/log/consul_watch.log"
        echo "Change detected for key: ${KEY}" >> "/opt/consul/log/consul_watch.log"
        echo "Old Value: ${OLD}" >> "/opt/consul/log/consul_watch.log"
        echo "New Value: ${NEW}" >> "/opt/consul/log/consul_watch.log"
        echo " " >> "/opt/consul/log/consul_watch.log"
    fi
    git add kvs.txt
    git commit -m "Update kvs.txt"
exit 0

fi

### Consul info for both Client and Server

<details>
  <summary>Client info</summary>

Client version: 1.11.3

(alpha-hexsafe-aws) root@hx-sg-al-hexsafe-alpha-1:/etc/consul.d$ consul info agent: check_monitors = 0 check_ttls = 1 checks = 161 services = 172 build: prerelease = revision = e319d7ed version = 1.11.3 consul: acl = enabled bootstrap = true known_datacenters = 1 leader = true leader_addr = 10.xxx:8300 server = true raft: applied_index = 639684 commit_index = 639684 fsm_pending = 0 last_contact = 0 last_log_index = 639684 last_log_term = 54 last_snapshot_index = 639465 last_snapshot_term = 54 latest_configuration = [{Suffrage:Voter ID:12114d1e-0b75-aa2d-e116-788422328c0c Address:10.xxx:8300}] latest_configuration_index = 0 num_peers = 0 protocol_version = 3 protocol_version_max = 3 protocol_version_min = 0 snapshot_version_max = 1 snapshot_version_min = 0 state = Leader term = 54 runtime: arch = amd64 cpu_count = 32 goroutines = 5578 max_procs = 32 os = linux version = go1.17.5 serf_lan: coordinate_resets = 0 encrypted = true event_queue = 1 event_time = 53 failed = 0 health_score = 0 intent_queue = 1 left = 0 member_time = 55 members = 1 query_queue = 0 query_time = 1 serf_wan: coordinate_resets = 0 encrypted = true event_queue = 0 event_time = 1 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 1 members = 1 query_queue = 0 query_time = 1


</details>

<details>
  <summary>Server info</summary>

Server version: 1.11.3

Number of server node: 1

(alpha-hexsafe-aws) root@hx-sg-al-hexsafe-alpha-1:/etc/consul.d$ consul info agent: check_monitors = 0 check_ttls = 1 checks = 161 services = 172 build: prerelease = revision = e319d7ed version = 1.11.3 consul: acl = enabled bootstrap = true known_datacenters = 1 leader = true leader_addr = 10.xxx:8300 server = true raft: applied_index = 639684 commit_index = 639684 fsm_pending = 0 last_contact = 0 last_log_index = 639684 last_log_term = 54 last_snapshot_index = 639465 last_snapshot_term = 54 latest_configuration = [{Suffrage:Voter ID:12114d1e-0b75-aa2d-e116-788422328c0c Address:10.xxx:8300}] latest_configuration_index = 0 num_peers = 0 protocol_version = 3 protocol_version_max = 3 protocol_version_min = 0 snapshot_version_max = 1 snapshot_version_min = 0 state = Leader term = 54 runtime: arch = amd64 cpu_count = 32 goroutines = 5578 max_procs = 32 os = linux version = go1.17.5 serf_lan: coordinate_resets = 0 encrypted = true event_queue = 1 event_time = 53 failed = 0 health_score = 0 intent_queue = 1 left = 0 member_time = 55 members = 1 query_queue = 0 query_time = 1 serf_wan: coordinate_resets = 0 encrypted = true event_queue = 0 event_time = 1 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 1 members = 1 query_queue = 0 query_time = 1


</details>

### Operating system and Environment details

Debian 10

### Log Fragments
I can see the below log of consul watch at starting or restarting consul agent service. But I didn't see any logs when I update the value of a key under `algorand/`

Jun 13 10:07:26 localhost consul[8909]: 2022-06-13T10:07:26.461Z [DEBUG] agent: watch handler output: watch_handler=[/etc/consul.d/consul_watch_handler.sh, algorand] output="Initialized empty Git repository in /opt/consul/kvs/algorand/.git/


### The expect behavior
The handler script can be invoked with configuration file that consul agent service use.

### Additional Information
Please let me know if you need any additional information!

Many thanks
Amier3 commented 2 years ago

Hey @hoangphuc1662

Thanks for the detailed bug report! I'll try my hand at replicating this and see if I can figure out what's going on 👍

huikang commented 2 years ago

I couldn't reproduce the error using consul v1.11.3. To narrow down the problem space (TLS, acl), @hoangphuc1662 , could you help simplify the deployment, e.g.,

  1. create a single node cluster using consul agent -dev with only the watch stanza
  2. update the kv to see if the handler can be triggered.

Thanks

RallyHereMattSmith commented 1 year ago

Heya - I'm just weighing in on this old bug in the hopes of helping someone else like me who ran into this EXACT same issue. The root cause, in my case anyway, was a missing KV Read-enabled ACL token in the individual watches config stanza. It appears that the initial startup execution leverages the agent's token and then subsequent watch checks use an anonymous token. Enabling TRACE log_level enabled me to see the 403 from the watches query to the KV store.

Hope this helps someone else stuck in the weeds here!