grafana / synthetic-monitoring-agent

Synthetic Monitoring Agent
https://grafana.com/docs/grafana-cloud/how-do-i/synthetic-monitoring/
Apache License 2.0
160 stars 24 forks source link

Verbose logging is required for adhoc checks to work #586

Open peterschretlen opened 8 months ago

peterschretlen commented 8 months ago

Context

When creating a new check in Synthetic Monitoring, you can use the "test" button to try a check before saving it.

This test button creates a "adhoc" check. How an adhoc check works:

  1. Click the test button. The app sends an API request to synthetic monitoring to run an adhoc check
  2. The API will select probes to run the check and push them to the probes
  3. The probes will run the check, and push their results to Loki ( you can filter for these results by using the LogQL query {type="adhoc"} in your logs explore view.
  4. The synthetic monitoring app then queries Loki with {type="adhoc"} as well as a filter for a request ID to determine if the test worked (if you open the network tab in the browser when running the test, you'll see the app making this query periodically). There will be a log message "ad-hoc check done" that signals the check worked. If we don't get a log match, eventually the test will time out.

Problem

When you have a private probe, the {type="adhoc"} logs are not pushed to Loki properly unless you have verbose logging enabled. Why does this happen?

Should the default logging level be info perhaps?

The-9880 commented 7 months ago

The code for logging ad-hoc check completion writes the JSON-formatted log to a bytes.Buffer object and publishes its contents to Loki right after (publisher.Publish). Since the log write uses a zerolog logger, it's impacted by the global log level and won't write unless run in --verbose.

I'm against changing the default log level as verbose is usually a flag to provide info logs for detailed debugging, while warn+ are usually the logs of interest.

Looking at the publisher code, we use kitlog to write structured/contextual logs for SM checks, while the zerolog logger writes to the OS stdout file on the probe instance (for local debugging purposes, I guess). I'd go with using the kitlog to report on the check success as it's unaffected by the zerolog global level.

The-9880 commented 7 months ago

Closing since the fix is merged! 🚀

peterschretlen commented 6 days ago

I am reopening this issue. Testing on v0.26.0-0-g0bccfddd I can reproduce the problem again. Unless I add the --debug=true or --verbose=true flag, adhoc checks send by the test button don't work. The logs are not sent to Loki. So something regressed with the logging since the original fix in https://github.com/grafana/synthetic-monitoring-agent/pull/631

peterschretlen commented 6 days ago

I think this is the issue: https://github.com/grafana/synthetic-monitoring-agent/blob/8d4865ff96f6067605882780c8e9e21c6254a8ad/cmd/synthetic-monitoring-agent/main.go#L176

We set the default level to error (level 3) but the adhoc logging level is warning (level 2): https://github.com/grafana/synthetic-monitoring-agent/blob/8d4865ff96f6067605882780c8e9e21c6254a8ad/internal/adhoc/adhoc.go#L494

Ref: zerolog log levels

peterschretlen commented 6 days ago

Since we changed the log message level to warning, but kept the default level to error, my guess is the fix in https://github.com/grafana/synthetic-monitoring-agent/pull/631 probably had no effect.