sensu / sensu-go

Simple. Scalable. Multi-cloud monitoring.
https://sensu.io
MIT License
1.02k stars 176 forks source link

Sensu checks are not executing on scheduled time #5051

Open Shivani3351 opened 7 months ago

Shivani3351 commented 7 months ago

Sensu checks are not executing occasionally on scheduled time

Expected Behavior

Checks should never stop being scheduled unless they are no longer published.

Current Behavior

Some sensu checks are not executing occasionally on scheduled time .

Sensu logs : influxdb handler asset configured {"component":"schedulerd","cron":"0 /2 ","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"tech-elkes-data-backup-validation","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:21Z"} {"component":"schedulerd","cron":"/35 ","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"tech-netty-thread-status-alert","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:21Z"} {"component":"schedulerd","cron":"/4 ","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"tech-nginx-basic-status","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:22Z"} {"component":"schedulerd","cron":"/4 ","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"tech-nginx-connection-alert","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:22Z"} {"component":"schedulerd","cron":"/4 ","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"tech-nginx-mdm-ui-server-status","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:22Z"} {"component":"schedulerd","cron":"/4 ","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"tech-nginx-rdp-api-server-status","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:22Z"} {"component":"schedulerd","cron":"/4 ","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"tech-nginx-status-alert","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:22Z"} {"component":"schedulerd","cron":"0 /12 ","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"tenant-rdp-validation-alert","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:23Z"} {"component":"schedulerd","cron":"0 /12 ","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"tenant-system-user-validation-alert","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:23Z"} {"component":"schedulerd","cron":"0 0 *","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"topology-apm-stats-alert","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:23Z"}

Possible Solution

Steps to Reproduce (for bugs)

1. 2. 3. 4.

Context

Some sensu checks are not executing occasionally on scheduled time .

influxdb handler asset configured {"component":"schedulerd","cron":"0 /2 ","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"tech-elkes-data-backup-validation","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:21Z"} {"component":"schedulerd","cron":"/35 ","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"tech-netty-thread-status-alert","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:21Z"} {"component":"schedulerd","cron":"/4 ","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"tech-nginx-basic-status","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:22Z"} {"component":"schedulerd","cron":"/4 ","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"tech-nginx-connection-alert","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:22Z"} {"component":"schedulerd","cron":"/4 ","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"tech-nginx-mdm-ui-server-status","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:22Z"} {"component":"schedulerd","cron":"/4 ","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"tech-nginx-rdp-api-server-status","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:22Z"} {"component":"schedulerd","cron":"/4 ","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"tech-nginx-status-alert","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:22Z"} {"component":"schedulerd","cron":"0 /12 ","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"tenant-rdp-validation-alert","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:23Z"} {"component":"schedulerd","cron":"0 /12 ","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"tenant-system-user-validation-alert","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:23Z"} {"component":"schedulerd","cron":"0 0 *","error":"error while starting ring watcher: context canceled","level":"error","msg":"error scheduling check","name":"topology-apm-stats-alert","namespace":"default","scheduler_type":"round-robin cron","time":"2024-02-07T16:27:23Z"}

Your Environment

elfranne commented 3 months ago

I had a similar issue and I solved it by building a cluster. We have around 19.000 checks spread across 600 hosts. Made a cluster a 3 node backends and a 3 node etcd cluster. No issue since.