sentenz / convention

General articles, conventions, and guides.
https://sentenz.github.io/convention/
Apache License 2.0
4 stars 2 forks source link

Create an article about `Observability` #313

Open sentenz opened 7 months ago

sentenz commented 7 months ago

Observability

Observability refers to the ability to understand the internal state of a system by examining its external outputs. It involves collecting, aggregating, analyzing, and leveraging data from various sources like logs, metrics, traces, and events to gain insights into the system's behavior, performance, and health.

Observability has evolved into a key practice for IT operations, DevOps, and Site Reliability Engineering (SRE) teams.

1. Category

1.1. Metrics

Quantitative measurements of system behavior over time, such as CPU usage, memory consumption, request latency, used for monitoring and alerting.

1.2. Logs

Records of events or actions occurring within the system, providing detailed information for troubleshooting and auditing.

1.2.1. Levels

Log levels

1.3. Traces

Distributed tracing data that shows the flow of requests through different parts of a system, helping to identify bottlenecks and performance issues.

1.4. Events

Notifications or signals emitted by the system to indicate specific occurrences, which can be consumed for real-time analysis or triggering other processes.

1.5. Grafana

1.5.1. Provisioning

1.5.1.1. Datasources
  1. Files and Folders

    • datasources.yaml

      TODO

      apiVersion: 1
      
      datasources:
      - name: Prometheus
        type: prometheus
        uid: prometheus
        access: proxy
        orgId: 1
        url: http://prometheus:9090
        basicAuth: false
        version: 1
        isDefault: false
        editable: true
        jsonData:
          httpMethod: GET
      
      - name: Alertmanager
        type: alertmanager
        uid: alertmanager
        access: proxy
        orgId: 1
        url: http://alertmanager:9093
        version: 1
        isDefault: false
        editable: true
        jsonData:
          implementation: prometheus
          handleGrafanaManagedAlerts: false
      
      - name: Loki
        type: loki
        uid: loki
        access: proxy
        orgId: 1
        url: http://loki:3100
        version: 1
        isDefault: false
        editable: true
      
      - name: Tempo
        type: tempo
        uid: tempo
        access: proxy
        orgId: 1
        url: http://tempo:3200
        basicAuth: false
        version: 1
        apiVersion: 1
        isDefault: true
        editable: true
        jsonData:
          httpMethod: GET
          serviceMap:
            datasourceUid: prometheus
1.5.1.2. Dashboards
  1. Files and Folders

    • dashboards.yaml

      TODO

      apiVersion: 1
      
      providers:
      - name: "Dashboard Provider"
        orgId: 1
        folder: ""
        type: file
        disableDeletion: false
        editable: true
        options:
          path: /var/lib/grafana/dashboards
    • /dashboards

      TODO

      /dashboards
      ├── gitlab.json
      ├── loki.json
      └── node-exporter.json

1.6. Promtail

  1. Files and Folders

    • promtail-config.yml

      In Grafana Promtail, the static_configs and pipeline_stages keyword configurations are used to define the collection and processing of log entries.

      • static_configs

        Defines a list of targets (files or directories) and labels that are statically configured within the scrape_configs section, from which Promtail will continuously collect logs.

      Examples and Explanations:

      scrape_configs:
        - job_name: system
          static_configs:
            - targets:
                - localhost
              labels:
                job: varlogs
                __path__: /var/log/*log
      • targets

        Specifies the files or directories using patterns like *.log from which logs should be collected.

      • labels

        Allows to attach metadata like job: app_logs to log sources, which can used for querying and filtering logs.

      • pipeline_stages

        Defines the sequence of operations that each log entry processed after being collected by Promtail and before being sent to Loki for querying and visualization in Grafana.

        NOTE Pipeline stages are processed sequentially, with each stage perform a different function, such as parsing, labeling, or filtering log lines.

      Examples and Explanations:

      pipeline_stages:
        - regex:
            expression: '(\w+)\s+(\d+)\s+([^:]+): (.*)'
            source: message
            destination: log_level
        - labels:
            log_level:
              error: '{{.match_1}}'
      • regex

        The regex stage uses a regular expression (expression) to parse the incoming log message (source: message). It captures specific parts of the log message and assigns them to named fields (destination: log_level).

      • labels

        The labels stage then extracts the matched fields (in this case, {{.match_1}} from the regex) and attaches them as labels (log_level) to the log entry.

2. References

sentenz commented 6 months ago

Refer #267