sensu / catalog

Monitoring as code for Sensu Go. "There's a template for that!"
8 stars 4 forks source link

Check generating errors after installed from catalog #261

Closed asachs01 closed 2 years ago

asachs01 commented 2 years ago

When installing the network interface integration, the check generates the following error:

{"check":"network-interface-health","component":"agent","error":"unable to extract metric from check output","level":"error","msg":"text format parsing error in line 1: expected float as value, got \"\"","namespace":"monitoring","time":"2022-05-26T14:53:48Z"}

The check definition is exactly what's been generated through the catalog integration:

Network Interface Check Definition ```json { "api_version": "core/v2", "metadata": { "annotations": {}, "name": "network-interface-health" }, "spec": { "command": "network-interface-checks --state-file {{ .annotations.network_interface_monitoring_state_file | default \"/var/cache/sensu/sensu-agent/network-interface-checks\" }} --exclude-interfaces {{ .annotations.excluded_network_interfaces | default \"\\\"\\\"\" }}", "env_vars": [], "interval": 30, "output_metric_format": "prometheus_text", "output_metric_tags": [ { "name": "entity", "value": "{{ .name }}" }, { "name": "host.name", "value": "{{ .name }}" }, { "name": "namespace", "value": "{{ .namespace }}" }, { "name": "os", "value": "{{ .system.os }}" } ], "output_metric_thresholds": [ { "name": "drop_in_rate", "tags": [ { "name": "interface", "value": "{{ .annotations.default_network_interface | default \"eth0\" }}" } ], "thresholds": [ { "max": "0.0", "status": 1 } ] }, { "name": "drop_out_rate", "tags": [ { "name": "interface", "value": "{{ .annotations.default_network_interface | default \"eth0\" }}" } ], "thresholds": [ { "max": "0.0", "status": 1 } ] }, { "name": "err_in_rate", "tags": [ { "name": "interface", "value": "{{ .annotations.default_network_interface | default \"eth0\" }}" } ], "thresholds": [ { "max": "0.0", "status": 1 } ] }, { "name": "err_out_rate", "tags": [ { "name": "interface", "value": "{{ .annotations.default_network_interface | default \"eth0\" }}" } ], "thresholds": [ { "max": "0.0", "status": 1 } ] } ], "pipelines": [], "publish": true, "runtime_assets": [ "sensu/network-interface-checks:0.2.0" ], "subscriptions": [ "system", "system/network", "darwin", "darwin/network", "linux", "linux/network", "windows", "windows/network" ], "timeout": 10, "ttl": 0 }, "type": "CheckConfig" } ```

Environmental info

OS: Ubuntu 20.04 Sensu Go Version: 6.7.2

FWIW, when executed manually, the check works just fine:

root@sensu02:/var/cache/sensu/sensu-agent/b574a724ab265db9d65b14831912d316897a036ef6edc3d29c456562dc10da582868d433b8f7c508d3078fdb5394ba033033bf861105c96d68f6dd3f0976c1df/bin# ./network-interface-checks
# HELP drop_in incoming packets dropped
# TYPE drop_in counter
drop_in{interface="tailscale0"} 0 1653588074582
drop_in{interface="ens18"} 128025 1653588074582
# HELP bytes_sent bytes sent
# TYPE bytes_sent counter
bytes_sent{interface="tailscale0"} 478226 1653588074582
bytes_sent{interface="ens18"} 2.3930173018e+10 1653588074582
# HELP packets_sent packets sent
# TYPE packets_sent counter
packets_sent{interface="tailscale0"} 5124 1653588074582
packets_sent{interface="ens18"} 3.0679821e+07 1653588074582
# HELP err_out outbound errors
# TYPE err_out counter
err_out{interface="tailscale0"} 0 1653588074582
err_out{interface="ens18"} 0 1653588074582
# HELP drop_out outbound packets dropped
# TYPE drop_out counter
drop_out{interface="ens18"} 0 1653588074582
drop_out{interface="tailscale0"} 0 1653588074582
# HELP bytes_recv bytes received
# TYPE bytes_recv counter
bytes_recv{interface="tailscale0"} 220248 1653588074582
bytes_recv{interface="ens18"} 2.5974517276e+10 1653588074582
# HELP packets_recv packets received
# TYPE packets_recv counter
packets_recv{interface="tailscale0"} 2610 1653588074582
packets_recv{interface="ens18"} 3.5482245e+07 1653588074582
# HELP err_in inbound errors
# TYPE err_in counter
err_in{interface="tailscale0"} 0 1653588074582
err_in{interface="ens18"} 0 1653588074582

There does seem to be a difference between the check definition as presented in the catalog, versus how the check is rendered, so I'm not sure if this is an error with the plugin itself, or with the catalog integration.

Catalog check command

"spec": {
        "command": "network-interface-checks --state-file {{ .annotations.network_interface_monitoring_state_file | default \"/var/cache/sensu/sensu-agent/network-interface-checks\" }} --exclude-interfaces {{ .annotations.excluded_network_interfaces | default \"\\\"\\\"\" }}",

Rendered check command

 "spec": {
        "check_hooks": null,
        "command": "network-interface-checks --state-file {{ .annotations.network_interface_monitoring_state_file | default \"/var/cache/sensu/sensu-agent/network-interface-checks\" }} --exclude-interfaces {{ .annotations.excluded_network_interfaces | default \"\" }}",

However, I'm not entirely sure if that's the problem or not.

jspaleta commented 2 years ago

Looks like an integration bug. The default value in the exclude interfaces looks malformed to me. Looks like the integration needs to be fixed so the default exclude interfaces isn't malformed.

Having the check output from the event would help.

jspaleta commented 2 years ago

So we've tried to reproduce the command line as rendered in this issue, and we were unsuccessful. Not sure how you produced that rendered commandline. We had to make some guesses on how your filled out the forms and we weren't able to reproduce the rendered command.

Even though we couldnt reproduce your rendered check... we did noodle around with the form inputs enough to figure out the integration templating was fragile to whitespace user input issues and we have an integration enhancement in process to fix what we did find. Hopefully this also fixes the issue presented here. Fingers crossed.

But based on our testing this appears to be a problem with the argument value to exclude inside the go templating using in the commandline. The go templating renders to an empty string and adds nothing to the commandline string. When the shell executes the commandline the argument processor in the plugin SDK throws an error because it expects an argument value for the exclude argument but there isn't one. If you go back and look at the event.check.output for the check you will probably see a Usage message indicating a cmdline argument processing error.