sensu / catalog

Monitoring as code for Sensu Go. "There's a template for that!"
8 stars 5 forks source link

Sensu Integration Catalog

This repository contains monitoring as code templates for enabling various monitoring integrations (e.g. Linux system monitoring checks, or NGINX service health monitoring), and pipeline integrations (e.g. Pagerduty, Elasticsearch, Splunk, Ansible Tower).

Project Goals

The goal of this project is to provide reference implementations for effective monitoring with Sensu Go. The Sensu Catalog should (eventually) provide everything a new Sensu user needs to get up and running and rapidly deploy across a large fleet of systems.

Sensu Integration Specification

A Sensu Catalog is a collection of Sensu Integrations. The contents of this catalog are periodically published to the official Sensu Catalog API, which is hosted at https://catalog.sensu.io.

See below for individual integration contents and API specification.

Integration directory structure

Sensu Integrations are defined as files on disk in the following structure:

integrations/
└── <namespace; e.g. "nginx">/
    └── <integration; e.g. "nginx-healthcheck">/
        ├── img/
        │   ├── dashboard-1.gif
        │   └── dashboard-2.png
        ├── CHANGELOG.md
        ├── README.md
        ├── logo.png
        ├── sensu-integration.yaml
        └── sensu-resources.yaml

Integration API specification

Sensu Integrations resemble Sensu Go API resources, but they are not processed by Sensu Go directly. See the [sensu/catalog-api] project for more information.

Example:

---
api_version: catalog/v1
type: Integration
metadata:
  namespace: nginx
  name: nginx-healthcheck
spec:
  class: "supported"
  provider: "monitoring"
  display_name: "NGINX monitoring"
  short_description: NGINX service health and performance monitoring
  supported_platforms:
    - linux
    - windows
    - darwin
  tags:
    - http
    - nginx
    - webserver
  contributors:
    - @sensu
    - @calebhailey
    - @jspaleta
    - @thoward
  prompts:
    - type: question
      name: url
      required: false
      input:
        type: string
        title: Default URL
        description: >-
          What is the default `nginx_status` endpoint URL that should be used?
        format: url
        default: http://127.0.0.1:80/nginx_status
    - type: question
      name: interval
      required: false
      input:
        type: integer
        title: Interval
        description: >-
          How often (in seconds) do you want to check the status of NGINX?
        format: duration
        default: 30
    - type: section
      title: Pipeline Configuration
    - type: markdown
      body: |
        Configure one or more [pipelines] for processing NGINX monitoring data.

        [pipelines]: https://docs.sensu.io/sensu-go/latest/observability-pipeline/
    - type: question
      name: metrics_pipeline
      required: false
      input:
        type: string
        title: Metrics Pipeline
        description: >-
          How do you want to process metrics collected by this integration?
        ref: core/v2/pipeline/metadata/name
        refFilter: .labels.provider == "metrics"
    - type: question
      name: alert_pipeline
      required: false
      input:
        type: string
        title: Alert Pipeline
        description: >-
          How do you want to be alerted for failures detected by this pipeline (e.g. Slack or Microsoft Teams)?
        ref: core/v2/pipeline/metadata/name
        refFilter: .labels.provider == "alerts"
    - type: question
      name: incident_pipeline
      required: false
      input:
        type: string
        title: Incident Management Pipeline
        description: >-
          How do you want to process incidents for failures detected by this pipeline (e.g. Atlassian JIRA/ServiceDesk, or Pagerduty)?
        ref: core/v2/pipeline/metadata/name
        refFilter: .labels.provider == "incidents"
  resource_patches:
    - resource:
        type: CheckConfig
        api_version: core/v2
        name: nginx-healthcheck
      patches:
        - path: /metadata/name
          op: replace
          value: nginx-healthcheck-[[unique_id]]
        - path: /spec/interval
          op: replace
          value: interval
        - path: /spec/command
          op: replace
          value: >-
            check-nginx-status.rb
            --url {{ .annotations.check_nginx_status_url | default "[[url]]" }}
        - path: /spec/pipelines/-
          op: add
          value:
            api_version: "core/v2"
            type: "Pipeline"
            name: "[[metrics_pipeline]]"
        - path: /spec/pipelines/-
          op: add
          value:
            api_version: "core/v2"
            type: "Pipeline"
            name: "[[alert_pipeline]]"
        - path: /spec/pipelines/-
          op: add
          value:
            api_version: "core/v2"
            type: "Pipeline"
            name: "[[incident_pipeline]]"

Sensu Integration guidelines

Please note the following guidelines for composing Sensu Integration:

  1. YAML format. All integration metadata (sensu-integration.yaml) and resources (sensu-resources.yaml) must be in YAML format, for consistency and comment support. All YAML files should use the .yaml file extension (not .yml), because we're picky that way.

  2. Namespace templating. Resource definitions (sensu-resources.yaml) should not include a namespace.

  3. Linting. All integrations will be validated via super-linter. We recommend running it locally to streamline PR approval.

  4. Naming conflicts. CheckConfig, HookConfig, Filter, Mutator, and Handler resource names must be unique within the scope of this project.

    NOTE: at this time we do not wish to enforce strict naming conventions. We will resolve naming conflicts on a case-by-case basis, which means resource names will be subject to change.

Sensu Integration Guidelines

CheckConfig guidelines

  1. Check templates resources should be defined in the following order (by resource type):

    • CheckConfig
    • HookConfig(s)
    • Secret(s)
    • Asset(s)
  2. Check resources must recommend one or more named subscriptions. At a minimum this should include the corresponding integrations "namespace" (sub-directory) as the default naming convention. For example, all PostgreSQL monitoring templates should include the "postgres" subscription. Check resources may optionally include additional/alternate subscription names (e.g. "pg" or "postgresql").

  3. The command field should preferably be wrapped using the YAML >- multiline "block scalar" syntax for readability.

    spec:
     command: >-
       check-disk-usage.rb
       -w {{ .annotations.disk_usage_warning | default 85 }}
       -c {{ .annotations.disk_usage_critical | default 95 }}
  4. As shown in the example above, check commands should include tunables using Sensu tokens, preferably sourced from Entity annotations (not labels) with explicitly configured defaults.

  5. Check resources should use the "interval" scheduler, with a minimum interval of 30 seconds.

  6. Check timeout should be set to a non-zero value and should not be greater than 50% of the interval.

  7. Check pipelines should be configured to one of the following generic pipelines.

    • alert (e.g. Slack, Mattermost, Microsoft Teams)
    • incident-management (e.g. Pagerduty, ServiceNow)
    • metrics (e.g. Sumo Logic, InfluxDB, TimescaleDB, Prometheus)
    • events (e.g. Sumo Logic, Elasticsearch, Splunk)
    • deregistration (e.g. Chef, Puppet, Ansible, EC2)
    • remediation (e.g. Ansible Tower, Rundeck, SaltStack)

Pipeline guidelines

  1. Pipeline template resources should be defined in the following order (by resource type):

    • Pipeline
    • Handler(s), SumoLogicMetricsHandler(s), and/or TCPStreamHandler(s)
    • Filter(s)
    • Mutator(s)
    • Secret(s)
    • Asset(s)
  2. For alert and incident-management handlers avoid the use of filters that have highly subjective configuration options. By default, use the built-in is_incident and not_silenced filters. However, we do encourage you to share your filters, as appropriate in the shared directory.

Asset guidelines

  1. Asset resources and their corresponding runtime_assets references must include a version reference in their resource name. For example: sensu/system-check:0.5.0.

  2. Asset resources should include an organization or author the resource name. For example, the official Sensu Pagerduty plugin hosted in the "sensu" organization on GitHub (sensu/sensu-pagerduty-handler), and published to under the "sensu" organization on Bonsai (sensu/sensu-pagerduty-handler) should be named: sensu/sensu-pagerduty-handler:2.1.0.

  3. All Sensu Assets resources must refer to assets hosted on Bonsai.

Contributing

There are three ways to contribute to this project:

  1. Use the integration templates provided in this catalog and share your feedback.

  2. Contribute "feature requests" to indicate interest in adding new integration templates.

  3. Contribute integration templates and/or modifications to existing templates.

How to do it:

Thanks in advance for your contributions!