Closed sentenz closed 1 year ago
Logging and Monitoring
- [ ] devopscube observability
For distributed systems and microservices, continuous monitoring in the form of centralized monitoring and logging provides the ability to detect problems and make informed decisions during operation and development.
Logging
Best practices for writing and storing logs once you’ve chosen a logging library, you’ll also want to plan for where in your code to make calls to the logger, how to store your logs, and how to make sense of them. In this section, we’ll recommend a series of best practices for organizing your Golang logs:
- Make calls to the logger from within your main application process, not within goroutines.
- Write logs from your application to a local file, even if you’ll ship them to a central platform later.
- Standardize your logs with a set of predefined messages.
- Send your logs to a central platform so you can analyze and aggregate them.
- Use HTTP headers and unique IDs to log user behavior across microservices.
Logging levels
OFF
Nothing is produced.
FATAL
Only produces messages when the application fault will likely result in the application terminating. Out of Memory is an error that would fall in this category.
ERROR
Also includes errors which will result in an interruption or fault in processing, but the application will most likely continue processing. Invalid Request would fit in this category.
WARN and INFO
These can be separate levels, but are not often used. Examples would be to include what submodules are being called, and showing GUIDs assigned to specific data for traceability.
DEBUG
Usually includes all the log messages which the application can produce. Used most often by developers to disable any and all logging statements they have placed in the code. Often these logs can include displaying raw user input, the full result set from SQL statements, and displaying statements that follow the application flow.
ALL and TRACE
Adds additional information, like every single call to a third- party system or application library that is made, which can be excessive, and is only really used when absolutely required.
Write logs to a file
Even if you’re shipping your logs to a central platform, we recommend writing them to a file on your local machine first. You will want to make sure your logs are always available locally and not lost in the network. In addition, writing to a file means that you can decouple the task of writing your logs from the task of sending them to a central platform. Your applications themselves will not need to establish connections or stream your logs, and you can leave these jobs to specialized software like the Datadog Agent. If you’re running your Go applications within a containerized infrastructure that does not already include persistent storage—e.g., containers running on AWS Fargate—you may want to configure your log management tool to collect logs directly from your containers’ STDOUT and STDERR streams (this is handled differently in Docker and Kubernetes).
Implement a standard logging interface
When writing calls to loggers from within their code, teams teams often use different attribute names to describe the same thing. Inconsistent attributes can confuse users and make it impossible to correlate logs that should form part of the same picture. For example, two developers might log the same error, a missing client name when handling an upload, in different ways.
Logs for the same error with different messages from different locations:
// TODO example
A good way to enforce standardization is to create an interface between your application code and the logging library. The interface contains predefined log messages that implement a certain format, making it easier to investigate issues by ensuring that log messages can be searched, grouped, and filtered.
Logs for an error using a standard interface to create a consistent message:
// TODO example
Centralize logs
If your application is deployed across a cluster of hosts, it’s not sustainable to SSH into each one in order to tail, grep, and investigate your logs. A more scalable alternative is to pass logs from local files to a central platform.
One solution is to use the Golang syslog package to forward logs from throughout your infrastructure to a single syslog server.
Track logs across microservices
Downstream microservices use the x-span headers of incoming requests to specify the parents of the spans they generate, and send that information as the x-parent header to the next microservice in the chain.
When troubleshooting an error, it’s often helpful to see what pattern of behavior led to it, even if that behavior involves a number of microservices. You can achieve this with distributed tracing, visualizing the order in which your application executes functions, database queries, and other tasks, and following these execution steps as they make their way through a network. One way to implement distributed tracing within your logs is to pass contextual information as HTTP headers.
If an error occurs in one of our microservices, we can use the trace, parent, and span attributes to see the route that a request has taken, letting us know which hosts—and possibly which parts of the application code—to investigate.
In the first microservice:
{"appname":"go-logging","level":"debug","msg":"Hello from Microservice One","trace":"eUBrVfdw","time":"2017-03-02T15:29:26+01:00","span":"UzWHRihF"}
In the second:
{"appname":"go-logging","level":"debug","msg":"Hello from Microservice Two","parent":"UzWHRihF","trace":"eUBrVfdw","time":"2017-03-02T15:29:26+01:00","span":"DPRHBMuE"}
Use monitoring platform that supports distributed tracing for applications to traces over time; and let you know about services with unusual request rates, error rates, or latency.
Monitoring
Tail your log files and forward logs to a central platform for processing and analysis.
You can use attributes to graph the values of certain log fields over time, sorted by group. For example, you could track the number of errors by service to let you know if there’s an incident in one of your services. Showing logs from only the go-logging-demo service, we can see how many error logs this service has produced in a given interval.
You can also use attributes to drill down into possible causes, for instance seeing if a spike in error logs belongs to a specific host. You can then create an automated alert based on the values of your logs.
See also
- Github Prometheus monitoring system and time series database repository.
- Github Grafana composable, observability and data visualization platform repository. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres.
- Datadog logging interface article.
Closed by #215
Logging and Monitoring
For distributed systems and microservices, continuous monitoring in the form of centralized monitoring and logging provides the ability to detect problems and make informed decisions during operation and development.
Logging
Best practices for writing and storing logs once you’ve chosen a logging library, you’ll also want to plan for where in your code to make calls to the logger, how to store your logs, and how to make sense of them. In this section, we’ll recommend a series of best practices for organizing your Golang logs:
Logging levels
OFF
FATAL
ERROR
WARN and INFO
DEBUG
ALL and TRACE
Write logs to a file
Even if you’re shipping your logs to a central platform, we recommend writing them to a file on your local machine first. You will want to make sure your logs are always available locally and not lost in the network. In addition, writing to a file means that you can decouple the task of writing your logs from the task of sending them to a central platform. Your applications themselves will not need to establish connections or stream your logs, and you can leave these jobs to specialized software like the Datadog Agent. If you’re running your Go applications within a containerized infrastructure that does not already include persistent storage—e.g., containers running on AWS Fargate—you may want to configure your log management tool to collect logs directly from your containers’ STDOUT and STDERR streams (this is handled differently in Docker and Kubernetes).
Implement a standard logging interface
When writing calls to loggers from within their code, teams teams often use different attribute names to describe the same thing. Inconsistent attributes can confuse users and make it impossible to correlate logs that should form part of the same picture. For example, two developers might log the same error, a missing client name when handling an upload, in different ways.
Logs for the same error with different messages from different locations:
// TODO example
A good way to enforce standardization is to create an interface between your application code and the logging library. The interface contains predefined log messages that implement a certain format, making it easier to investigate issues by ensuring that log messages can be searched, grouped, and filtered.
Logs for an error using a standard interface to create a consistent message:
// TODO example
Centralize logs
If your application is deployed across a cluster of hosts, it’s not sustainable to SSH into each one in order to tail, grep, and investigate your logs. A more scalable alternative is to pass logs from local files to a central platform.
One solution is to use the Golang syslog package to forward logs from throughout your infrastructure to a single syslog server.
Track logs across microservices
Downstream microservices use the x-span headers of incoming requests to specify the parents of the spans they generate, and send that information as the x-parent header to the next microservice in the chain.
When troubleshooting an error, it’s often helpful to see what pattern of behavior led to it, even if that behavior involves a number of microservices. You can achieve this with distributed tracing, visualizing the order in which your application executes functions, database queries, and other tasks, and following these execution steps as they make their way through a network. One way to implement distributed tracing within your logs is to pass contextual information as HTTP headers.
If an error occurs in one of our microservices, we can use the trace, parent, and span attributes to see the route that a request has taken, letting us know which hosts—and possibly which parts of the application code—to investigate.
In the first microservice:
In the second:
Use monitoring platform that supports distributed tracing for applications to traces over time; and let you know about services with unusual request rates, error rates, or latency.
Monitoring
Tail your log files and forward logs to a central platform for processing and analysis.
You can use attributes to graph the values of certain log fields over time, sorted by group. For example, you could track the number of errors by service to let you know if there’s an incident in one of your services. Showing logs from only the go-logging-demo service, we can see how many error logs this service has produced in a given interval.
You can also use attributes to drill down into possible causes, for instance seeing if a spike in error logs belongs to a specific host. You can then create an automated alert based on the values of your logs.
See also