micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.
https://micrometer.io
Apache License 2.0
4.5k stars 994 forks source link

How to flush data to Datadog on AWS Lambda function completing #5460

Open DanielWaldenSanlam opened 3 months ago

DanielWaldenSanlam commented 3 months ago

Hi,

I'm not sure if this is an issue with Micrometer, Datadog or my usage.

I have custom metrics I'd like to see published to Datadog from my AWS Lambda function. The metric is a count. The Lambda function returns quickly, but may be invoked sporadically.

My expectation is that whenever the Lambda function completes, any metric data should be flushed to Datadog.

My findings are that that the data is not always flushed on the Lamdba function completing. Example:

  1. Fire 3 events rapidly at the function. Result: no count appears in Datadog
  2. After 10 seconds, fire 1 event at the function. Result: count = 3 appears in Datadog.
  3. After 10 seconds, fire 2 events at the function. Result: count = 1 appears in Datadog.

It appears that the data from previous invocations are flushed once the Lamdba function is unfrozen, which could mean after quite some delay (if it does eventually happen).

These delayed reportings further have the downside in that the timestamp shown in Datadog is at the time they are eventually flushed, rather than within the period in which they were logged.

I don't see a flush() method on the StatsdMeterRegistry. Invoking close() is not really acceptable as that would presumably require re-instantiating the meter registry for every single invocation.

My Lambda function:


public final class MyLambdaHandler
    implements RequestHandler<Map<String, Object>, Void> {

  private final MeterRegistry registry;

  public MyLambdaHandler() { // invoked only when this handler is instantiated.
    registry = provideMeterRegistry();
  }

  @Override
  public Void handleRequest(Map<String, Object> map, Context context) { // invoked on every request
    registry.counter("test").increment();
    return null;
  }

  private MeterRegistry provideMeterRegistry() {
    final var configMap = Map.of(
        "statsd.enabled", "true"
    );
    return StatsdMeterRegistry.builder(new StatsdConfig() {
          @Override
          public String get(String key) {
            return configMap.get(key);
          }

          @Override
          public StatsdFlavor flavor() {
            return StatsdFlavor.DATADOG;
          }
        })
        .build();
  }
}

My Lambda function is provisioned with the Datadog Serverless macro using AWS Cloudformation:


Transform:
  - AWS::Serverless-2016-10-31
  - Name: DatadogServerless
  ...

If you could please advise how to achieve reliable sending of metrics from AWS Lambdas, or whether I'm misunderstanding something. Thanks!

shakuzen commented 3 months ago

See https://github.com/micrometer-metrics/micrometer/issues/1156. The problem last I looked into it is that there is no way to get a callback that the process is being suspended. If the process is being shutdown, you can use a shutdown hook to close the registry which will flush metrics. I would have to check on the latest to see if there is any more we can do around flushing on suspension.

shakuzen commented 3 months ago

Since you are using the Statsd registry, you could try setting the buffered config to false, which may improve the situation but it is not guaranteed since publishing still happens asynchronously to recording the metric. The only way to force flushing would be to stop the registry after each invocation and start it at the beginning of the invocation, which would be rather inefficient.