grafana / agent

Vendor-neutral programmable observability pipelines.
https://grafana.com/docs/agent/
Apache License 2.0
1.6k stars 488 forks source link

Grafana Agent Operator: error during reconciling additional scrape configs #1788

Closed daper closed 2 years ago

daper commented 2 years ago

Grafana Agent Operator: v0.24.1

When I add the following job to additional scrape configs:

- job_name: ec2-exporter
  ec2_sd_configs:
    - region: eu-west-1
      port: 9100

the operator produces this error:

level=error
ts=2022-06-10T10:44:09.819032876Z
component=controller.grafanaagent
name=grafana-agent
namespace=monitoring
msg="error during reconciling"
err="unable to build config: RUNTIME ERROR: Not a json type: 9100
  ext/marshal.libsonnet:6:20-53 function <anonymous>
  metrics.libsonnet:129:9-34  thunk from <thunk from <object <anonymous>>>
  utils/k8s.libsonnet:48:19-22  function <anonymous>
  metrics.libsonnet:(126:5)-(131:6) thunk from <object <anonymous>>
  ext/optionals.libsonnet:33:8-13 function <anonymous>
  metrics.libsonnet:(58:19)-(132:4) object <anonymous>
  Field \"scrape_configs\"  
  Array element 0 
  Field \"configs\" 
  Field \"metrics\" 
  ext/optionals.libsonnet:40:17-50  function <anonymous>
  agent-metrics.libsonnet:(27:28)-(64:3)  thunk from <function <anonymous>>
  ext/marshal.libsonnet:3:44-50 thunk from <function <anonymous>>
  ext/marshal.libsonnet:3:18-51 function <anonymous>
  agent-metrics.libsonnet:(27:15)-(64:4)  function <anonymous>
  Top-level function call 
"

But if I change the port to be like port: "9100", the operator reconciles the secret but grafana agent says that cannot unmarshal !!str into int.

rlankfo commented 2 years ago

Hi @daper, I'll see if I can reproduce this. In the meantime, can you try upgrading to version v0.25.0 of the operator? Thanks!

daper commented 2 years ago

@rlankfo I've upgraded the agent, when I lift the operator to v.0.25.0, produces the following error and restarts itself:

grafana-agent-operator-dc85dc7b6-p844l grafana-agent-operator level=info ts=2022-06-13T15:03:30.325949224Z component=controller.grafanaagent msg="Starting Controller"
grafana-agent-operator-dc85dc7b6-p844l grafana-agent-operator level=error ts=2022-06-13T15:05:30.326233587Z component=controller.node msg="Could not wait for Cache to sync" err="failed to wait for node caches to sync: timed out waiting for cache to be synced"
grafana-agent-operator-dc85dc7b6-p844l grafana-agent-operator level=info ts=2022-06-13T15:05:30.326284402Z msg="Stopping and waiting for non leader election runnables"
grafana-agent-operator-dc85dc7b6-p844l grafana-agent-operator level=info ts=2022-06-13T15:05:30.326326107Z msg="Stopping and waiting for leader election runnables"
grafana-agent-operator-dc85dc7b6-p844l grafana-agent-operator level=error ts=2022-06-13T15:05:30.326497924Z component=controller.grafanaagent msg="Could not wait for Cache to sync" err="failed to wait for grafanaagent caches to sync: timed out waiting for cache to be synced"
grafana-agent-operator-dc85dc7b6-p844l grafana-agent-operator level=info ts=2022-06-13T15:05:30.326509037Z msg="Stopping and waiting for caches"
grafana-agent-operator-dc85dc7b6-p844l grafana-agent-operator level=error ts=2022-06-13T15:05:30.3265645Z msg="error received after stop sequence was engaged" err="failed to wait for grafanaagent caches to sync: timed out waiting for cache to be synced"
grafana-agent-operator-dc85dc7b6-p844l grafana-agent-operator level=info ts=2022-06-13T15:05:30.327016741Z msg="Stopping and waiting for webhooks"
grafana-agent-operator-dc85dc7b6-p844l grafana-agent-operator level=info ts=2022-06-13T15:05:30.327031636Z msg="Wait completed, proceeding to shutdown the manager"
grafana-agent-operator-dc85dc7b6-p844l grafana-agent-operator level=error ts=2022-06-13T15:05:30.327070765Z msg="problem running manager" err="failed to wait for node caches to sync: timed out waiting for cache to be synced"
- grafana-agent-operator-dc85dc7b6-p844l › grafana-agent-operator

Doesn't seem to be related...

rlankfo commented 2 years ago

hi @daper, the errors you're seeing look unrelated in v0.25.0; however, I was able to track down the main issue and create a PR to fix.

Until we're able to get a patch release out, you can workaround the problem by forcing the port number to be evaluated as a float for now by appending a period:

- job_name: ec2-exporter
  ec2_sd_configs:
    - region: eu-west-1
      port: 9100.
daper commented 2 years ago

Great! It worked :D Thank you @rlankfo!