sensu / sensu-go

Simple. Scalable. Multi-cloud monitoring.
https://sensu.io
MIT License
1.03k stars 175 forks source link

Error running round-robin proxy check with a single agent #2926

Closed apaskulin closed 4 years ago

apaskulin commented 5 years ago

Expected Behavior

The guide to monitoring external resources (https://docs.sensu.io/sensu-go/5.6/guides/monitor-external-resources/) works as documented.

Current Behavior

The check-http check (proxy requests with round-robin) shown in the guide never executes. The logs show that Sensu is matching the entities as expected, but then the scheduler shuts down and the check doesn't get executed.

May 07 19:19:43 sensu-centos sensu-backend[4124]: {"component":"schedulerd","entity":"github-site","expression":"entity.entity_class == 'proxy'","level":"debug","msg":"expression matches entity","namespace":"default","time":"2019-05-07T19:19:43Z"}
May 07 19:19:43 sensu-centos sensu-backend[4124]: {"component":"schedulerd","entity":"github-site","expression":"entity.labels.proxy_type == 'website'","level":"debug","msg":"expression matches entity","namespace":"default","time":"2019-05-07T19:19:43Z"}
May 07 19:19:43 sensu-centos sensu-backend[4124]: {"component":"schedulerd","entity":"packagecloud-site","expression":"entity.entity_class == 'proxy'","level":"debug","msg":"expression matches entity","namespace":"default","time":"2019-05-07T19:19:43Z"}
May 07 19:19:43 sensu-centos sensu-backend[4124]: {"component":"schedulerd","entity":"packagecloud-site","expression":"entity.labels.proxy_type == 'website'","level":"debug","msg":"expression matches entity","namespace":"default","time":"2019-05-07T19:19:43Z"}
May 07 19:19:43 sensu-centos sensu-backend[4124]: {"component":"schedulerd","entity":"sensu-docs","expression":"entity.entity_class == 'proxy'","level":"debug","msg":"expression matches entity","namespace":"default","time":"2019-05-07T19:19:43Z"}
May 07 19:19:43 sensu-centos sensu-backend[4124]: {"component":"schedulerd","entity":"sensu-docs","expression":"entity.labels.proxy_type == 'website'","level":"debug","msg":"expression matches entity","namespace":"default","time":"2019-05-07T19:19:43Z"}
May 07 19:19:43 sensu-centos sensu-backend[4124]: {"component":"schedulerd","entity":"sensu-site","expression":"entity.entity_class == 'proxy'","level":"debug","msg":"expression matches entity","namespace":"default","time":"2019-05-07T19:19:43Z"}
May 07 19:19:43 sensu-centos sensu-backend[4124]: {"component":"schedulerd","entity":"sensu-site","error":"TypeError: Cannot access member 'proxy_type' of undefined","expression":"entity.labels.proxy_type == 'website'","level":"debug","msg":"skipping expression","namespace":"default","time":"2019-05-07T19:19:43Z"}
May 07 19:19:43 sensu-centos sensu-backend[4124]: {"component":"schedulerd","level":"info","msg":"shutting down scheduler","name":"check-http","namespace":"default","scheduler_type":"round-robin interval","time":"2019-05-07T19:19:43Z"}

Possible Solution

Steps to Reproduce (for bugs)

  1. Run through the guide
  2. Check for executions by the check-http check

Your Environment

echlebek commented 5 years ago

The proxy entities defined in the guide do not have any subscriptions. Entities that do not have any subscriptions cannot execute checks.

The scheduler notices this, and as a result shuts down.

The check is targeting a subscription called "proxy". Please try setting

subscriptions: ["proxy"]

on the entities, and try again! :smile_cat:

apaskulin commented 5 years ago

@echlebek I added the proxy subscription to the proxy entities and ran through the guide again. It looks like the check runs once, then I see the same shutting down scheduler error. This is now using Sensu 5.7.0. Thanks for your help!

Entity configuration:

type: Entity
api_version: core/v2
metadata:
  labels:
    proxy_type: website
    url: https://github.com
  name: github-site
  namespace: default
spec:
  entity_class: proxy
  subscriptions:
  - proxy
---
type: Entity
api_version: core/v2
metadata:
  labels:
    proxy_type: website
    url: https://packagecloud.io
  name: packagecloud-site
  namespace: default
spec:
  entity_class: proxy
  subscriptions:
  - proxy
---
type: Entity
api_version: core/v2
metadata:
  labels:
    proxy_type: website
    url: https://docs.sensu.io
  name: sensu-docs
  namespace: default
spec:
  entity_class: proxy
  subscriptions:
  - proxy
apaskulin commented 5 years ago

After pairing with Eric, we determined that this issue appears when attempting to schedule a round-robin proxy check using proxy request attributes with only a single agent running.

palourde commented 5 years ago

@echlebek @apaskulin I've been fixating over this issue all night and finally got some time to play with our rings mechanism and turns out I wasn't able to reproduce this issue at all.

Here's the step I've done:

  1. Wipe etcd state-dir & start sensu-backend
  2. Create the following resources:
    type: Entity
    api_version: core/v2
    metadata:
    labels:
    url: http://google.com/
    has_http_server: "true"
    needs_fping: "true"
    name: proxy-entity
    namespace: default
    spec:
    deregister: false
    entity_class: proxy
    ---
    type: CheckConfig
    api_version: core/v2
    metadata:
    name: proxy-http
    namespace: default
    spec:
    command: echo {{ .labels.url }}
    interval: 20
    proxy_requests:
    entity_attributes:
    - entity.labels.has_http_server == 'true'
    splay: true
    splay_coverage: 90
    publish: true
    round_robin: true
    subscriptions:
    - proxy
  3. Start a sensu-agent with the proxy subscription:
    $ sensu-agent start --subscriptions proxy
  4. Wait ~20 seconds and observe the events:
    $ s event list
     Entity        Check            Output         Status   Silenced             Timestamp
    ────────────── ──────────── ──────────────────── ──────── ────────── ───────────────────────────────
    proxy-entity   proxy-http   http://google.com/        0   false      2019-07-12 23:25:40 -0400 EDT

I would definitely be curious to know if you can still reproduce this issue, or did we somehow ended up fixing it since the 5.7.0 release?

daswars commented 5 years ago

i habe thew same problem docker sensu/sensu:5.11.0 cluster with 3 backends

palourde commented 4 years ago

@daswars Could you let us know if you are able to reproduce this issue with a recent version of Sensu Go? Thanks

lk3687051 commented 4 years ago

@palourde hi, I face this issue too. Did not know how to fix it, Can you help me, which case will cause
shutting down scheduler.

My check: === server-snmp-config Name: server-snmp-config Interval: 60 Command: python plugins/config_server_snmp.py -H {{ .annotations.address }} -c {{ .annotations.snmp_community }} Cron:
Timeout: 0 TTL: 0 Subscriptions: proxy Handlers:
Runtime Assets:
Hooks:
Publish?: true Stdin?: false Proxy Entity Name: Namespace: default Metric Format:
Metric Handlers: config

But it stoped and newer run again

palourde commented 4 years ago

@lk3687051 Could you provide us more information about your setup? We would need information such as your sensu-backend version, logs from the backend and the agents that's supposed to execute this check, etc. It might be easier to open a new issue too.