sentenz / convention

General articles, conventions, and guides.
https://sentenz.github.io/convention/
Apache License 2.0
4 stars 2 forks source link

Create an article about `logging and monitoring` #9

Closed sentenz closed 1 year ago

sentenz commented 2 years ago

Logging and Monitoring

For distributed systems and microservices, continuous monitoring in the form of centralized monitoring and logging provides the ability to detect problems and make informed decisions during operation and development.

Logging

Best practices for writing and storing logs once you’ve chosen a logging library, you’ll also want to plan for where in your code to make calls to the logger, how to store your logs, and how to make sense of them. In this section, we’ll recommend a series of best practices for organizing your Golang logs:

Logging levels

Write logs to a file

Even if you’re shipping your logs to a central platform, we recommend writing them to a file on your local machine first. You will want to make sure your logs are always available locally and not lost in the network. In addition, writing to a file means that you can decouple the task of writing your logs from the task of sending them to a central platform. Your applications themselves will not need to establish connections or stream your logs, and you can leave these jobs to specialized software like the Datadog Agent. If you’re running your Go applications within a containerized infrastructure that does not already include persistent storage—e.g., containers running on AWS Fargate—you may want to configure your log management tool to collect logs directly from your containers’ STDOUT and STDERR streams (this is handled differently in Docker and Kubernetes).

Implement a standard logging interface

When writing calls to loggers from within their code, teams teams often use different attribute names to describe the same thing. Inconsistent attributes can confuse users and make it impossible to correlate logs that should form part of the same picture. For example, two developers might log the same error, a missing client name when handling an upload, in different ways.

Logs for the same error with different messages from different locations:

// TODO example

A good way to enforce standardization is to create an interface between your application code and the logging library. The interface contains predefined log messages that implement a certain format, making it easier to investigate issues by ensuring that log messages can be searched, grouped, and filtered.

Logs for an error using a standard interface to create a consistent message:

// TODO example

Centralize logs

If your application is deployed across a cluster of hosts, it’s not sustainable to SSH into each one in order to tail, grep, and investigate your logs. A more scalable alternative is to pass logs from local files to a central platform.

One solution is to use the Golang syslog package to forward logs from throughout your infrastructure to a single syslog server.

Track logs across microservices

Downstream microservices use the x-span headers of incoming requests to specify the parents of the spans they generate, and send that information as the x-parent header to the next microservice in the chain.

When troubleshooting an error, it’s often helpful to see what pattern of behavior led to it, even if that behavior involves a number of microservices. You can achieve this with distributed tracing, visualizing the order in which your application executes functions, database queries, and other tasks, and following these execution steps as they make their way through a network. One way to implement distributed tracing within your logs is to pass contextual information as HTTP headers.

If an error occurs in one of our microservices, we can use the trace, parent, and span attributes to see the route that a request has taken, letting us know which hosts—and possibly which parts of the application code—to investigate.

In the first microservice:

{"appname":"go-logging","level":"debug","msg":"Hello from Microservice One","trace":"eUBrVfdw","time":"2017-03-02T15:29:26+01:00","span":"UzWHRihF"}

In the second:

{"appname":"go-logging","level":"debug","msg":"Hello from Microservice Two","parent":"UzWHRihF","trace":"eUBrVfdw","time":"2017-03-02T15:29:26+01:00","span":"DPRHBMuE"}

Use monitoring platform that supports distributed tracing for applications to traces over time; and let you know about services with unusual request rates, error rates, or latency.

Monitoring

Tail your log files and forward logs to a central platform for processing and analysis.

You can use attributes to graph the values of certain log fields over time, sorted by group. For example, you could track the number of errors by service to let you know if there’s an incident in one of your services. Showing logs from only the go-logging-demo service, we can see how many error logs this service has produced in a given interval.

You can also use attributes to drill down into possible causes, for instance seeing if a spike in error logs belongs to a specific host. You can then create an automated alert based on the values of your logs.

See also

sentenz commented 2 years ago

Logging and Monitoring

For distributed systems and microservices, continuous monitoring in the form of centralized monitoring and logging provides the ability to detect problems and make informed decisions during operation and development.

Logging

Best practices for writing and storing logs once you’ve chosen a logging library, you’ll also want to plan for where in your code to make calls to the logger, how to store your logs, and how to make sense of them. In this section, we’ll recommend a series of best practices for organizing your Golang logs:

  • Make calls to the logger from within your main application process, not within goroutines.
  • Write logs from your application to a local file, even if you’ll ship them to a central platform later.
  • Standardize your logs with a set of predefined messages.
  • Send your logs to a central platform so you can analyze and aggregate them.
  • Use HTTP headers and unique IDs to log user behavior across microservices.

Logging levels

  • OFF

    Nothing is produced.

  • FATAL

    Only produces messages when the application fault will likely result in the application terminating. Out of Memory is an error that would fall in this category.

  • ERROR

    Also includes errors which will result in an interruption or fault in processing, but the application will most likely continue processing. Invalid Request would fit in this category.

  • WARN and INFO

    These can be separate levels, but are not often used. Examples would be to include what submodules are being called, and showing GUIDs assigned to specific data for traceability.

  • DEBUG

    Usually includes all the log messages which the application can produce. Used most often by developers to disable any and all logging statements they have placed in the code. Often these logs can include displaying raw user input, the full result set from SQL statements, and displaying statements that follow the application flow.

  • ALL and TRACE

    Adds additional information, like every single call to a third- party system or application library that is made, which can be excessive, and is only really used when absolutely required.

Write logs to a file

Even if you’re shipping your logs to a central platform, we recommend writing them to a file on your local machine first. You will want to make sure your logs are always available locally and not lost in the network. In addition, writing to a file means that you can decouple the task of writing your logs from the task of sending them to a central platform. Your applications themselves will not need to establish connections or stream your logs, and you can leave these jobs to specialized software like the Datadog Agent. If you’re running your Go applications within a containerized infrastructure that does not already include persistent storage—e.g., containers running on AWS Fargate—you may want to configure your log management tool to collect logs directly from your containers’ STDOUT and STDERR streams (this is handled differently in Docker and Kubernetes).

Implement a standard logging interface

When writing calls to loggers from within their code, teams teams often use different attribute names to describe the same thing. Inconsistent attributes can confuse users and make it impossible to correlate logs that should form part of the same picture. For example, two developers might log the same error, a missing client name when handling an upload, in different ways.

Logs for the same error with different messages from different locations:

// TODO example

A good way to enforce standardization is to create an interface between your application code and the logging library. The interface contains predefined log messages that implement a certain format, making it easier to investigate issues by ensuring that log messages can be searched, grouped, and filtered.

Logs for an error using a standard interface to create a consistent message:

// TODO example

Centralize logs

If your application is deployed across a cluster of hosts, it’s not sustainable to SSH into each one in order to tail, grep, and investigate your logs. A more scalable alternative is to pass logs from local files to a central platform.

One solution is to use the Golang syslog package to forward logs from throughout your infrastructure to a single syslog server.

Track logs across microservices

Downstream microservices use the x-span headers of incoming requests to specify the parents of the spans they generate, and send that information as the x-parent header to the next microservice in the chain.

When troubleshooting an error, it’s often helpful to see what pattern of behavior led to it, even if that behavior involves a number of microservices. You can achieve this with distributed tracing, visualizing the order in which your application executes functions, database queries, and other tasks, and following these execution steps as they make their way through a network. One way to implement distributed tracing within your logs is to pass contextual information as HTTP headers.

If an error occurs in one of our microservices, we can use the trace, parent, and span attributes to see the route that a request has taken, letting us know which hosts—and possibly which parts of the application code—to investigate.

In the first microservice:

{"appname":"go-logging","level":"debug","msg":"Hello from Microservice One","trace":"eUBrVfdw","time":"2017-03-02T15:29:26+01:00","span":"UzWHRihF"}

In the second:

{"appname":"go-logging","level":"debug","msg":"Hello from Microservice Two","parent":"UzWHRihF","trace":"eUBrVfdw","time":"2017-03-02T15:29:26+01:00","span":"DPRHBMuE"}

Use monitoring platform that supports distributed tracing for applications to traces over time; and let you know about services with unusual request rates, error rates, or latency.

Monitoring

Tail your log files and forward logs to a central platform for processing and analysis.

You can use attributes to graph the values of certain log fields over time, sorted by group. For example, you could track the number of errors by service to let you know if there’s an incident in one of your services. Showing logs from only the go-logging-demo service, we can see how many error logs this service has produced in a given interval.

You can also use attributes to drill down into possible causes, for instance seeing if a spike in error logs belongs to a specific host. You can then create an automated alert based on the values of your logs.

See also

  • Github Prometheus monitoring system and time series database repository.
  • Github Grafana composable, observability and data visualization platform repository. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres.
  • Datadog logging interface article.
sentenz commented 1 year ago

Closed by #215