Observability

Observability refers to the ability to understand the internal state of a system by examining its external outputs. It involves collecting, aggregating, analyzing, and leveraging data from various sources like logs, metrics, traces, and events to gain insights into the system's behavior, performance, and health.

Observability has evolved into a key practice for IT operations, DevOps, and Site Reliability Engineering (SRE) teams.

1. Category
2. References

1. Category

1.1. Metrics

Quantitative measurements of system behavior over time, such as CPU usage, memory consumption, request latency, used for monitoring and alerting.

1.2. Logs

Records of events or actions occurring within the system, providing detailed information for troubleshooting and auditing.

1.2.1. Levels

Log levels

DEBUG

Fine-grained informational events that are most useful to debug an application. These are usually of value to devlopers and are very verbose.
INFO

Informational messages that highlight the progress of the application at coarse-grained level.
WARN

Potentially harmful situations that indicate a risk to an application. These can trigger an alarm in an applicaiton.
ERROR

Error events that might still allow the application to continue running. These are likely to trigger an alarm that requires attention.
FATAL

Very severe error events that will presumably cause an application to abort.
CRITICAL

NONE

1.3. Traces

Distributed tracing data that shows the flow of requests through different parts of a system, helping to identify bottlenecks and performance issues.

1.4. Events

Notifications or signals emitted by the system to indicate specific occurrences, which can be consumed for real-time analysis or triggering other processes.

1.5. Grafana

1.5.1. Provisioning

Environment Variables

GF_<SECTION_NAME>_<KEY_NAME>

Setting	Default
GF_PATHS_CONFIG	/etc/grafana/grafana.ini
GF_PATHS_DATA	/var/lib/grafana
GF_PATHS_HOME	/usr/share/grafana
GF_PATHS_LOGS	/var/log/grafana
GF_PATHS_PLUGINS	/var/lib/grafana/plugins
GF_PATHS_PROVISIONING	/etc/grafana/provisioning

1.5.1.1. Datasources

Prometheus Data Source

Files and Folders

datasources.yaml

TODO

apiVersion: 1

datasources:
- name: Prometheus
  type: prometheus
  uid: prometheus
  access: proxy
  orgId: 1
  url: http://prometheus:9090
  basicAuth: false
  version: 1
  isDefault: false
  editable: true
  jsonData:
    httpMethod: GET

- name: Alertmanager
  type: alertmanager
  uid: alertmanager
  access: proxy
  orgId: 1
  url: http://alertmanager:9093
  version: 1
  isDefault: false
  editable: true
  jsonData:
    implementation: prometheus
    handleGrafanaManagedAlerts: false

- name: Loki
  type: loki
  uid: loki
  access: proxy
  orgId: 1
  url: http://loki:3100
  version: 1
  isDefault: false
  editable: true

- name: Tempo
  type: tempo
  uid: tempo
  access: proxy
  orgId: 1
  url: http://tempo:3200
  basicAuth: false
  version: 1
  apiVersion: 1
  isDefault: true
  editable: true
  jsonData:
    httpMethod: GET
    serviceMap:
      datasourceUid: prometheus

1.5.1.2. Dashboards

OpenTelemetry Collector

Grafana dashboard to visualize metrics collected by the OpenTelemetry Collector. The OpenTelemetry Collector is a vendor-agnostic implementation that receives, processes, and exports telemetry data.
DORA Metrics

Grafana dashboard to visualize DORA (DevOps Research and Assessment) metrics. DORA metrics are a standard set of DevOps metrics used for evaluating process performance and maturity.

Files and Folders

dashboards.yaml

TODO

apiVersion: 1

providers:
- name: "Dashboard Provider"
  orgId: 1
  folder: ""
  type: file
  disableDeletion: false
  editable: true
  options:
    path: /var/lib/grafana/dashboards

/dashboards

TODO

/dashboards
├── gitlab.json
├── loki.json
└── node-exporter.json

1.6. Promtail

Files and Folders
- promtail-config.yml
  
  In Grafana Promtail, the static_configs and pipeline_stages keyword configurations are used to define the collection and processing of log entries.
  - static_configs
    
    Defines a list of targets (files or directories) and labels that are statically configured within the scrape_configs section, from which Promtail will continuously collect logs.
  Examples and Explanations:
```
scrape_configs:
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          __path__: /var/log/*log
```
  - targets
    
    Specifies the files or directories using patterns like *.log from which logs should be collected.
  - labels
    
    Allows to attach metadata like job: app_logs to log sources, which can used for querying and filtering logs.
  - pipeline_stages
    
    Defines the sequence of operations that each log entry processed after being collected by Promtail and before being sent to Loki for querying and visualization in Grafana.
    
    NOTE Pipeline stages are processed sequentially, with each stage perform a different function, such as parsing, labeling, or filtering log lines.
  Examples and Explanations:
```
pipeline_stages:
  - regex:
      expression: '(\w+)\s+(\d+)\s+([^:]+): (.*)'
      source: message
      destination: log_level
  - labels:
      log_level:
        error: '{{.match_1}}'
```
  - regex
    
    The regex stage uses a regular expression (expression) to parse the incoming log message (source: message). It captures specific parts of the log message and assigns them to named fields (destination: log_level).
  - labels
    
    The labels stage then extracts the matched fields (in this case, {{.match_1}} from the regex) and attaches them as labels (log_level) to the log entry.

2. References

Grafana Play page.

sentenz / convention

Create an article about `Observability` #313

Observability

1. Category

1.1. Metrics

1.2. Logs

1.2.1. Levels

1.3. Traces

1.4. Events

1.5. Grafana

1.5.1. Provisioning

1.5.1.1. Datasources

1.5.1.2. Dashboards

1.6. Promtail

2. References