Observability refers to the ability to understand the internal state of a system by examining its external outputs. It involves collecting, aggregating, analyzing, and leveraging data from various sources like logs, metrics, traces, and events to gain insights into the system's behavior, performance, and health.
Observability has evolved into a key practice for IT operations, DevOps, and Site Reliability Engineering (SRE) teams.
Fine-grained informational events that are most useful to debug an application. These are usually of value to devlopers and are very verbose.
INFO
Informational messages that highlight the progress of the application at coarse-grained level.
WARN
Potentially harmful situations that indicate a risk to an application. These can trigger an alarm in an applicaiton.
ERROR
Error events that might still allow the application to continue running. These are likely to trigger an alarm that requires attention.
FATAL
Very severe error events that will presumably cause an application to abort.
CRITICAL
NONE
1.3. Traces
Distributed tracing data that shows the flow of requests through different parts of a system, helping to identify bottlenecks and performance issues.
1.4. Events
Notifications or signals emitted by the system to indicate specific occurrences, which can be consumed for real-time analysis or triggering other processes.
Grafana dashboard to visualize metrics collected by the OpenTelemetry Collector. The OpenTelemetry Collector is a vendor-agnostic implementation that receives, processes, and exports telemetry data.
Grafana dashboard to visualize DORA (DevOps Research and Assessment) metrics. DORA metrics are a standard set of DevOps metrics used for evaluating process performance and maturity.
In Grafana Promtail, the static_configs and pipeline_stages keyword configurations are used to define the collection and processing of log entries.
static_configs
Defines a list of targets (files or directories) and labels that are statically configured within the scrape_configs section, from which Promtail will continuously collect logs.
Specifies the files or directories using patterns like *.log from which logs should be collected.
labels
Allows to attach metadata like job: app_logs to log sources, which can used for querying and filtering logs.
pipeline_stages
Defines the sequence of operations that each log entry processed after being collected by Promtail and before being sent to Loki for querying and visualization in Grafana.
NOTE Pipeline stages are processed sequentially, with each stage perform a different function, such as parsing, labeling, or filtering log lines.
The regex stage uses a regular expression (expression) to parse the incoming log message (source: message). It captures specific parts of the log message and assigns them to named fields (destination: log_level).
labels
The labels stage then extracts the matched fields (in this case, {{.match_1}} from the regex) and attaches them as labels (log_level) to the log entry.
Observability
Observability refers to the ability to understand the internal state of a system by examining its external outputs. It involves collecting, aggregating, analyzing, and leveraging data from various sources like logs, metrics, traces, and events to gain insights into the system's behavior, performance, and health.
Observability has evolved into a key practice for IT operations, DevOps, and Site Reliability Engineering (SRE) teams.
1. Category
1.1. Metrics
Quantitative measurements of system behavior over time, such as CPU usage, memory consumption, request latency, used for monitoring and alerting.
1.2. Logs
Records of events or actions occurring within the system, providing detailed information for troubleshooting and auditing.
1.2.1. Levels
Log levels
DEBUG
INFO
WARN
ERROR
FATAL
CRITICAL
NONE
1.3. Traces
Distributed tracing data that shows the flow of requests through different parts of a system, helping to identify bottlenecks and performance issues.
1.4. Events
Notifications or signals emitted by the system to indicate specific occurrences, which can be consumed for real-time analysis or triggering other processes.
1.5. Grafana
1.5.1. Provisioning
Environment Variables
1.5.1.1. Datasources
Files and Folders
datasources.yaml
1.5.1.2. Dashboards
OpenTelemetry Collector
DORA Metrics
Files and Folders
dashboards.yaml
/dashboards
1.6. Promtail
Files and Folders
promtail-config.yml
static_configs
Examples and Explanations:
targets
labels
pipeline_stages
Examples and Explanations:
regex
labels
2. References