Open peterschretlen opened 8 months ago
The code for logging ad-hoc check completion writes the JSON-formatted log to a bytes.Buffer
object and publishes its contents to Loki right after (publisher.Publish
). Since the log write uses a zerolog
logger, it's impacted by the global log level and won't write unless run in --verbose
.
I'm against changing the default log level as verbose
is usually a flag to provide info
logs for detailed debugging, while warn+
are usually the logs of interest.
Looking at the publisher code, we use kitlog
to write structured/contextual logs for SM checks, while the zerolog
logger writes to the OS stdout file on the probe instance (for local debugging purposes, I guess). I'd go with using the kitlog
to report on the check success as it's unaffected by the zerolog
global level.
Closing since the fix is merged! 🚀
I am reopening this issue. Testing on v0.26.0-0-g0bccfddd I can reproduce the problem again. Unless I add the --debug=true
or --verbose=true
flag, adhoc checks send by the test button don't work. The logs are not sent to Loki. So something regressed with the logging since the original fix in https://github.com/grafana/synthetic-monitoring-agent/pull/631
I think this is the issue: https://github.com/grafana/synthetic-monitoring-agent/blob/8d4865ff96f6067605882780c8e9e21c6254a8ad/cmd/synthetic-monitoring-agent/main.go#L176
We set the default level to error
(level 3) but the adhoc logging level is warning
(level 2):
https://github.com/grafana/synthetic-monitoring-agent/blob/8d4865ff96f6067605882780c8e9e21c6254a8ad/internal/adhoc/adhoc.go#L494
Ref: zerolog log levels
Since we changed the log message level to warning, but kept the default level to error, my guess is the fix in https://github.com/grafana/synthetic-monitoring-agent/pull/631 probably had no effect.
Context
When creating a new check in Synthetic Monitoring, you can use the "test" button to try a check before saving it.
This test button creates a "adhoc" check. How an adhoc check works:
{type="adhoc"}
in your logs explore view.{type="adhoc"}
as well as a filter for a request ID to determine if the test worked (if you open the network tab in the browser when running the test, you'll see the app making this query periodically). There will be a log message "ad-hoc check done" that signals the check worked. If we don't get a log match, eventually the test will time out.Problem
When you have a private probe, the
{type="adhoc"}
logs are not pushed to Loki properly unless you have verbose logging enabled. Why does this happen?info
levelerror
. Setting-verbose=true
will configure the level toinfo
.Should the default logging level be
info
perhaps?