sensu / sensu-go

Simple. Scalable. Multi-cloud monitoring.
https://sensu.io
MIT License
1.02k stars 176 forks source link

Agent unable to start due to "bucket already exists" #5044

Open andersonvaf opened 8 months ago

andersonvaf commented 8 months ago

Expected Behavior

Current Behavior

The sensu-agent is unable to start due to "bucket already exists", and we have the following output from journalctl

Jan 23 15:27:44 test systemd[1]: Starting The Sensu Agent process....
Jan 23 15:27:44 test systemd[1]: Started The Sensu Agent process..
Jan 23 15:27:44 test sensu-agent[22032]: {"component":"agent","level":"info","msg":"compacting api queue","time":"2024-01-23T15:27:44-05:00"}
Jan 23 15:27:44 test sensu-agent[22032]: {"component":"agent","level":"info","msg":"finished api queue compaction","time":"2024-01-23T15:27:44-05:00"}
Jan 23 15:27:44 test sensu-agent[22032]: {"component":"agent","error":"error creating agent: error compacting queue: bucket already exists","level":"fatal","msg":"error executing sensu-agent","time":"2024-01-23T15:27:44-05:00"}
Jan 23 15:27:44 test systemd[1]: sensu-agent.service: main process exited, code=exited, status=1/FAILURE
Jan 23 15:27:44 test systemd[1]: Unit sensu-agent.service entered failed state.
Jan 23 15:27:44 test systemd[1]: sensu-agent.service holdoff time over, scheduling restart.
Jan 23 15:27:44 test systemd[1]: Stopping The Sensu Agent process....
Jan 23 15:27:44 test systemd[1]: Starting The Sensu Agent process....
Jan 23 15:27:44 test systemd[1]: sensu-agent.service start request repeated too quickly, refusing to start.
Jan 23 15:27:44 test systemd[1]: Failed to start The Sensu Agent process..
Jan 23 15:27:44 test systemd[1]: Unit sensu-agent.service entered failed state.

Possible Solution

I know for a fact that removing the file /var/cache/sensu/sensu-agent/.lasr.temp.db makes sensu-agent to start successfully.

Steps to Reproduce (for bugs)

Unfortunately, I'm unable to reproduce the issue. However, we've already seen this issue happen a few times in production environments.

I've included the .last.temp.db file in case it helps: lasr.temp.zip

Context

I've capture the following inotify events:

/var/cache/sensu/sensu-agent/ OPEN queue.db
/var/cache/sensu/sensu-agent/ ACCESS queue.db
/var/cache/sensu/sensu-agent/ MODIFY queue.db
/var/cache/sensu/sensu-agent/ MODIFY queue.db
/var/cache/sensu/sensu-agent/ OPEN .lasr.temp.db
/var/cache/sensu/sensu-agent/ ACCESS .lasr.temp.db
/var/cache/sensu/sensu-agent/ CLOSE_WRITE,CLOSE .lasr.temp.db
/var/cache/sensu/sensu-agent/ CLOSE_WRITE,CLOSE queue.db
/var/cache/sensu/sensu-agent/ OPEN queue.db
/var/cache/sensu/sensu-agent/ ACCESS queue.db
/var/cache/sensu/sensu-agent/ MODIFY queue.db
/var/cache/sensu/sensu-agent/ MODIFY queue.db
/var/cache/sensu/sensu-agent/ OPEN .lasr.temp.db
/var/cache/sensu/sensu-agent/ ACCESS .lasr.temp.db
/var/cache/sensu/sensu-agent/ CLOSE_WRITE,CLOSE .lasr.temp.db
/var/cache/sensu/sensu-agent/ CLOSE_WRITE,CLOSE queue.db
/var/cache/sensu/sensu-agent/ OPEN queue.db
/var/cache/sensu/sensu-agent/ ACCESS queue.db
/var/cache/sensu/sensu-agent/ MODIFY queue.db
/var/cache/sensu/sensu-agent/ MODIFY queue.db
/var/cache/sensu/sensu-agent/ OPEN .lasr.temp.db
/var/cache/sensu/sensu-agent/ ACCESS .lasr.temp.db
/var/cache/sensu/sensu-agent/ CLOSE_WRITE,CLOSE .lasr.temp.db
/var/cache/sensu/sensu-agent/ CLOSE_WRITE,CLOSE queue.db
/var/cache/sensu/sensu-agent/ OPEN queue.db
/var/cache/sensu/sensu-agent/ ACCESS queue.db
/var/cache/sensu/sensu-agent/ MODIFY queue.db
/var/cache/sensu/sensu-agent/ MODIFY queue.db
/var/cache/sensu/sensu-agent/ OPEN .lasr.temp.db
/var/cache/sensu/sensu-agent/ ACCESS .lasr.temp.db
/var/cache/sensu/sensu-agent/ CLOSE_WRITE,CLOSE .lasr.temp.db
/var/cache/sensu/sensu-agent/ CLOSE_WRITE,CLOSE queue.db
/var/cache/sensu/sensu-agent/ OPEN queue.db
/var/cache/sensu/sensu-agent/ ACCESS queue.db
/var/cache/sensu/sensu-agent/ MODIFY queue.db
/var/cache/sensu/sensu-agent/ MODIFY queue.db
/var/cache/sensu/sensu-agent/ OPEN .lasr.temp.db
/var/cache/sensu/sensu-agent/ ACCESS .lasr.temp.db
/var/cache/sensu/sensu-agent/ CLOSE_WRITE,CLOSE .lasr.temp.db
/var/cache/sensu/sensu-agent/ CLOSE_WRITE,CLOSE queue.db

And also strace when trying to execute the sensu-agent in foreground: strace-lasr.txt

Your Environment

# sensu-agent version is 6.10.0 (compiled from source)
sensu-agent version (devel)+ce, community edition, built with go1.18.3

# cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 8 (jessie)"
NAME="Debian GNU/Linux"
VERSION_ID="8"
VERSION="8 (jessie)"
ID=debian
HOME_URL="http://www.debian.org/"
SUPPORT_URL="http://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"