joshmfrankel / joshfrankel.me

Repo for my portfolio website
joshfrankel.me
0 stars 0 forks source link

Instrumenting your application with Datadog #7

Open joshmfrankel opened 7 months ago

joshmfrankel commented 7 months ago

Dashboard

Monitors

Creating Web request based latency monitors

We will configure our latency SLA to alert us if any request goes over 1.5 seconds

  1. Click Monitors
  2. Click New Monitor
  3. Select trace.rack.request as the metric
  4. Filter by your web service
  5. Aggregate based on avg. We only want average request latency.
  6. Evaluate the average over the last 10 minutes. This will calculate the average over a 10 minute period and alert if over threshold

Image

  1. Set your thresholds for alerting

Image

  1. Notify your team for the specific monitor with details

Image

Creating SLA based background job monitors in Sidekiq

If your application has Service-Level Agreements, you can monitor these and alert when the contract fails. For example, if your background job's queue is matched to an SLA (e.g. within_5_minutes), you can alert if the job latency is outside this threshold. For this example, we will be utilizing Sidekiq as our background job adapter.

  1. First you'll need to instrument latency for sidekiq. If you have Sidekiq Enterprise the following wiki explains configuration. Otherwise you'll need to implement this solution yourself. This article be helpful.
  2. Once you have the latency metric, we can now build a new monitor
  3. Navigate to Monitors
  4. Click create new monitor
  5. Select the metric sidekiq.sidekiq.queue.latency
  6. Filter the "from" by the queue name of queue:within_5_minutes
  7. This metric can only be aggregated by Max
  8. Now because this metric is a gauge returning seconds we need to convert it to minutes in order to set easier alert thresholds.
  9. Click "Add Formula"
  10. Our formula will take the gauge (in seconds) and divide by 60 in order to calculate minutes
  11. Because our SLA for this queue is 5 minutes, we'll evaluate the maximum over the last 5 minutes. We'll alert if the maximum is over 5 minutes in this time period.

Image

  1. Now we can set our alert conditions to alert us if greater than 5 (minutes in this case)

Image

  1. Lastly give this monitor a good message

Image

Tags

Set custom tags for additional Trace data

As your application runs, Datadog tracks incoming traces for the various layers of your application. One way you can improve these is by adding tags.

Tags give you the ability to inject additional information into your traces such as: user id, endpoint name, outgoing request params, etc... These can be utilized for debugging as well as observability.

The following shows off a very simple tracking of the user id:

          if defined?(Datadog::Tracing)
            Datadog::Tracing.active_span&.set_tag("app_context.user.id", current_user.id)
          end

Once this is configured you'll need to wait for Datadog to ingest the new traces from requests. Shortly you'll be able to filter your traces with @app_context.user.id

Image

When filtering its also important to remember the layer of the traces span. Unless you are tagging from the entry span (e.g. endpoint) you'll need to filter traces by "All Spans". The below screenshot illustrates this concept:

Image

Use custom tags as facets within traces

  1. Navigate to Traces
  2. Click Add facet

Image

  1. Select your tag app_context.user.id and give it a Display name Image
  2. Now you can filter by the new facet in your traces Image

Find tagged trace data

  1. Click on the trace (this will show the drawer for the resource)
  2. Make sure you are on the "info" tab Image
  3. Search through Span Tags to find your new section

Image

Configure new metrics based on filters

In Datadog, you can create new metrics based on your current dataset. This can be done by adding filter rules to "Custom Span Metrics" within "Generate Metrics"

  1. Hover on APM
  2. Click Setup & Configuration

Image

  1. Click "Generate Metrics" tab

Image

  1. Add your metric name. This will be what you can search for in your metrics
  2. Define your Query. In our example we filter by the production environment for our API service
  3. You can also group these by your custom tags. For this example, I've grouped by the user id from our previously configured @app_context.user.id

Image

  1. You'll now be able to filter in metrics by this value to see how Users interact with the API