Closed raihanchdy closed 1 year ago
How do you expect Sensu guys to help if you don't provide enough details? You even did not follow the template where you find guidance what details to fill...
Anyway, we experienced this several times as well (no issue submitted from our side yet).
@raihanchdy we will need more information to determine if this is a bug or a simple configuration issue. Can you first of all check to see if there are any agent entities configured with the roundrobin:worker
subscription? The following command should help:
sensuctl entity list --field-selector='"roundrobin:worker" IN entity.subscriptions'
NOTE: in Sensu Go it is no longer necessary to prefix subscriptions with roundrobin:
, so your roundrobin:worker
subscription is not being parsed or having any other special handling applied to it; I'll assume this is a vestige remaining after migrating configs from Sensu Core.
@calebhailey Please find the output of the command sensuctl entity list --field-selector='"roundrobin:worker" IN entity.subscriptions'
All of a sudden most of the proxyclient checks stopped execution The check I shared was running but all of a sudden its not even executing
@calebhailey More Info: A check is scheduled to execute every 2 hours but all of a sudden the check stopped execution. No error logged when a check is not executing. Without error its very tough to know the why a check is not getting executed
Workaround: Changed the time interval to 2 minutes and recreated the check and then check started execution.
Recreated the check with sensuctl command and the check started execution. My concern is as long as sensu cluster is running the check should be running
No idea, how this can be possible
@calebhailey I installed sensu 6.4.0 and the issue still persists where checks are not executing at their specified interval. Sensu is mostly used for executing checks and if checks itself aren't executing then its a huge issue. My kind request is can you please look into it and let me know if you need more details. The above check is configured to execute every 15 minutes but its already 32 minutes where the check has not executed.
Please let me know for additional details.
@calebhailey The issue persists even in the latest release 6.4.2. Could you please let me know if this a bug and when the fix will come. With this issue, sensu has become unusable.
The issue is with roundrobin checks with proxyclient attribute.
@raihanchdy I can't see definition entity attributes for proxy check. You are using proxy client, thus you have entity attributes defined in the check I assume. Can you share that?
@mcbsd please find the check definition { "api_version": "core/v2", "type": "Check", "metadata": { "namespace": "default", "name": "vm-diskusage-check-mnt-eslog-critical", "labels": {}, "annotations": { "sensu.io.json_attributes": "{\"type\":\"standard\",\"refresh\":7200}", "fatigue_check/interval": "7200" } }, "spec": { "command": "python3.6 /etc/sensu/plugins/vm-alerts.py disk disk_data_elk --critical 85 --check vm-diskusage-check-mnt-eslog-critical", "subscriptions": [ "worker" ], "round_robin": true, "publish": true, "cron": "/15 *", "handlers": [ "alert_handler_no_host", "resolve_handler_no_host", "ops_alert_handler_no_host", "tester_handler" ], "proxy_entity_name": "proxyclient", "timeout": 890 } }
In sensu server log there is an entry with sending check request but the check is not executing on the agent
@raihanchdy I don't think this is Sensu issue. Seems like your check definition is incomplete. Proxy check needs to have entity attributes, for instane (including splay, optional):
... "entity_attributes": [ "entity.entity_class == 'proxy'", "entity.name.indexOf('firewall') >= 0" ], "splay": true, "splay_coverage": 90`` ...
I suggest to read this https://docs.sensu.io/sensu-go/latest/observability-pipeline/observe-schedule/checks/#use-a-proxy-check-to-monitor-multiple-proxy-entities , and then think if what you have fits to the concept.
What is actually Sensu issue, that you can execute proxy checks on demand, that's real fail here, issue is opened for it, hopefully one day...
@raihanchdy following up here since @mcbsd replied on the issue. If this is still happening on a recent Sensu release, let's move the conversation over to Sensu Community Slack or to the Sensu Discourse Forums
I have more than 200 checks running on sensu 3 node cluster. Randomly some of the checks are not getting exeuted.
This check is scheduled to run every 5 minutes but the events showing that the check executed 2 hours ago which I ran executed manually from the UI.
Please help to fix the issue