sensu / sensu-go

Simple. Scalable. Multi-cloud monitoring.
https://sensu.io
MIT License
1.01k stars 177 forks source link

Bug: cron checks are executed on backend startup regardless of schedule #5008

Closed echlebek closed 10 months ago

echlebek commented 1 year ago

Expected Behavior

Cron checks only execute when they are scheduled to do so.

Current Behavior

On backend startup, the cron check will execute once immediately, before resuming its scheduled operation.

Possible Solution

Suppress the initial execution by some mechanism.

Steps to Reproduce (for bugs)

Create any cron check, and restart the backend. Observe that the check executes immediately.

For example, create the following check:

---
type: CheckConfig
api_version: core/v2
metadata:
  created_by: sensuadmin
  labels:
    sensu.io/managed_by: sensuctl
  name: cron_check_test
  namespace: default
spec:
  command: echo "Hello World"
  env_vars: null
  handlers: null
  round_robin: true
  subscriptions:
  - system
  publish: true
  interval: 0
  cron: 0 19 5 * *
  secrets:
  timeout: 0

Restart the backend(s) and observe that the event is created outside of the cron expression:

CleanShot 2023-05-11 at 15 13 01 CleanShot 2023-05-11 at 15 13 13 CleanShot 2023-05-11 at 15 13 30

It also appears that the backends do indeed run round robin checks on startup:

May 11 19:07:19 sensu02.sachshaus.net sensu-backend[1505593]: {"agents":["lb01.sachshaus.net"],"component":"schedulerd","cron":"0 19 5 * *","level":"info","msg":"executing round robin check on agents","name":"cron_check_test","namespace":"default","scheduler_type":"round-robin cron","time":"2023-05-11T19:07:19Z"}
May 11 19:07:19 sensu02.sachshaus.net sensu-backend[1505593]: {"check_name":"cron_check_test","check_namespace":"default","component":"eventd","entity_name":"lb01.sachshaus.net","entity_namespace":"default","event_id":"58dd4057-b39d-4272-9787-0c3a7340bfe3","level":"info","msg":"eventd received event","time":"2023-05-11T19:07:19Z"}
May 11 19:07:19 sensu02.sachshaus.net sensu-backend[1505593]: {"check_name":"cron_check_test","check_namespace":"default","component":"pipelined","entity_name":"lb01.sachshaus.net","entity_namespace":"default","event_id":"58dd4057-b39d-4272-9787-0c3a7340bfe3","level":"info","msg":"no pipelines defined in resource","time":"2023-05-11T19:07:19Z"}
May 11 19:08:40 sensu02.sachshaus.net sensu-backend[1505593]: {"component":"schedulerd","cron":"0 19 5 * *","level":"info","msg":"stopping scheduler","name":"cron_check_test","namespace":"default","scheduler_type":"round-robin cron","time":"2023-05-11T19:08:40Z"}
May 11 19:08:44 sensu02.sachshaus.net sensu-backend[1509762]: {"check":"cron_check_test","component":"store-providers","level":"info","msg":"check scheduler restarting","scheduler":"postgres","time":"2023-05-11T19:08:44Z"}
May 11 19:08:44 sensu02.sachshaus.net sensu-backend[1509762]: {"component":"schedulerd","cron":"0 19 5 * *","level":"info","msg":"starting new round-robin cron scheduler","name":"cron_check_test","namespace":"default","scheduler_type":"round-robin cron","time":"2023-05-11T19:08:44Z"}
May 11 19:08:44 sensu02.sachshaus.net sensu-backend[1509762]: {"agents":["lb01.sachshaus.net"],"component":"schedulerd","cron":"0 19 5 * *","level":"info","msg":"executing round robin check on agents","name":"cron_check_test","namespace":"default","scheduler_type":"round-robin cron","time":"2023-05-11T19:08:44Z"}
May 11 19:30:41 sensu02.sachshaus.net sensu-backend[1509762]: {"component":"schedulerd","cron":"0 19 5 * *","level":"info","msg":"stopping scheduler","name":"cron_check_test","namespace":"default","scheduler_type":"round-robin cron","time":"2023-05-11T19:30:41Z"}
May 11 19:30:45 sensu02.sachshaus.net sensu-backend[1520345]: {"check":"cron_check_test","component":"store-providers","level":"info","msg":"check scheduler not restarted","scheduler":"postgres","time":"2023-05-11T19:30:45Z"}
May 11 19:30:45 sensu02.sachshaus.net sensu-backend[1520345]: {"component":"schedulerd","cron":"0 19 5 * *","level":"info","msg":"starting new round-robin cron scheduler","name":"cron_check_test","namespace":"default","scheduler_type":"round-robin cron","time":"2023-05-11T19:30:45Z"}
May 11 19:30:45 sensu02.sachshaus.net sensu-backend[1520345]: {"agents":["lb01.sachshaus.net"],"component":"schedulerd","cron":"0 19 5 * *","level":"info","msg":"executing round robin check on agents","name":"cron_check_test","namespace":"default","scheduler_type":"round-robin cron","time":"2023-05-11T19:30:45Z"}

Context

Discovered by a Sensu user in production. See https://secure.helpscout.net/conversation/2219824869/31348?folderId=1211661 for context.

ccressent commented 1 year ago

A few quick notes regarding reproducing this:

echlebek commented 10 months ago

Unfortunately this problem is a design flaw. It would require an external central queue such as rabbitmq to alleviate, with agents connecting directly to it. We can't fix this in the 6.x version of Sensu.