micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.
https://micrometer.io
Apache License 2.0
4.47k stars 990 forks source link

support Autocloseable with Timer #1425

Closed garfieldnate closed 4 years ago

garfieldnate commented 5 years ago

I'm migrating from the prometheus API and changing the Timer code is quite cumbersome. In Prometheus, I could do the following:

public void myTimedMethod() throws CheckedException {
    try(timer.startTimer()) {
        methodThatThrowsCheckedException();
    }
}

However, Micrometer's Timer objects are not Autocloseable. The methods I see in Timer, record() and recordCallable(), take lambda functions, but lambda functions do not allow throwing checked exceptions, leading to lots of annoying boilerplate (I'm following section 3.1 from this Baeldung article). recordCallable() explicitly declares the caller as throwing Exception, which is also not good because now the code I write has to handle the general Exception instead of my specific exception class.

Because I really prefer the autocloseable usage, I ended up writing a little wrapper class:

public static class TimerCloser implements AutoCloseable {
    private final Timer timer;
    private final Timer.Sample sample;

    public TimerCloser(Timer t) {
        this.timer = t;
        this.sample = Timer.start();
    }

    public void close(){
        sample.stop(timer);
    }
}

And I use it like this:

try(TimerCloser closer = new TimerCloser(myTimer)) {...

It would be much easier for me if the I could call myTimer.start() and that would return an autocloseable class similar to Timer.Sample, and then the close() method would call Timer.Sample.stop(). We can't just change Timer.Sample to be autocloseable, unfortunately, because it contains no reference to the timer metric to write the results to.

jkschneider commented 5 years ago

I could see adding an overload for Timer.start(Timer) that returns a subclass or specialization of Timer.Sample that is AutoCloseable. Not happy with the inability in Java to handle exceptions functionally with generics, and this does seem like a better approach.

jkschneider commented 5 years ago

I think I'm recalling why we didn't do this before when I looked at this kind of approach earlier. When you time blocks that can throw exceptions in almost every case you want to add some sort of outcome tag so that you can dimensionally drill down on failures vs. successes. I think the Prometheus API in this case encourages you down a path that neglects differentiating success from failure, which you will later regret. I'm open to hearing an argument for ignoring it though.

Things that don't work

  1. AutoCloseable just doesn't provide any means to retrieve the exception on close. If it did, something like this might be possible:
try (Timer.AutoCloseableSample s = Timer.start(registry, timer, (e, t) -> t.tag("error", e.getClass().getSimpleName()))) {
    // do some work that ultimately throws...
    throw new IllegalStateException("still timing");
} catch (IllegalStateException e) {
    // do nothing
}
  1. try-with-resources doesn't allow you to access the resource in catch blocks. If it did, something like this would be do-able:
try (Timer.AutoCloseableSample s = Timer.start(registry, timer)) {
    // do some work that ultimately throws...
    throw new IllegalStateException("still timing");
} catch (IllegalStateException e) {
    s.tag("error", e.getClass().getSimpleName());
}

What does work

In the end, this really isn't that much more verbose, and I think is the best we can do:

Timer.Sample sample = Timer.start();
Timer.Builder timerBuilder = Timer.builder("my.timer");
try {
    // do some work that ultimately throws...
    throw new IllegalStateException("still timing");

    sample.stop(timerBuilder.tag("outcome", "success").register(registry));
} catch (IllegalStateException e) {
    sample.stop(timerBuilder.tags("outcome", "failure", "error", e.getClass().getSimpleName()).register(registry));
}
RichMacDonald commented 4 years ago

Interesting. I'd never thought about error tags in the original Timer. I always add a second Counter metric to count the exceptions. A mindset difference caused by working with systems that do not support tags, e.g., graphite.

I use a hand-rolled AutoCloseableSample. And because I also have concerns about the GC ramifications of many short-lived Sample objects, I use a cache system (A ThreadLocalFactory) to provide them. So I have

try (AutoCloseableSample s = Helper.start(timer) { // do some work that ultimately throws... throw new IllegalStateException("still timing"); } catch (IllegalStateException e) { errorCounter.increment(); }

Separate Counters for exceptions makes some sense when you have alarm systems monitoring for them. But either way works

jkschneider commented 4 years ago

GC ramifications of many short-lived Sample objects

These should be really really cheap, and if the timing is happening at a reasonable level of code, I would think the heap impact of Sample is dwarfed by what is being timed around it. Metrics timers aren't really typically something done in tight loops.

Separate Counters for exceptions makes some sense

I think it's crucial to differentiate by tag on the timer because when you're monitoring something like 99th percentile latency you should consider how different failure modes lead to much different latencies.

When you don't distinguish these in timer tags, all these disparate latency characteristics get munged into one thing.

jkschneider commented 4 years ago

I really appreciated the discussion on this, and open to more ideas, but for now I don't see a way to make Timer.Sample auto-closeable and not discourage tagging. It may be different than the Prometheus client, but I guess that's not always bad?

RichMacDonald commented 4 years ago

GC ramifications of many short-lived Sample objects

I've since become more educated, specifically from https://vimeo.com/181925278 and testing with JITWatch. If your AutoCloseableObject is final and stays simple, the VM will not even allocate an extra object, so there is literally no GC penalty at all. The impact is the same as if the AutoCloseableObject fields were defined as local fields in your method.

RichMacDonald commented 4 years ago

Holding my nose:

try (
    Timer.Sample sample = Timer.start();
    Timer.Builder timerBuilder = Timer.builder("my.timer");) {
    try {
        // do some work that ultimately throws...
        throw new IllegalStateException("still timing");
        sample.stop(timerBuilder.tag("outcome", "success").register(registry));
    } catch (IllegalStateException e) {
        sample.stop(timerBuilder.tags("outcome", "failure", "error", e.getClass().getSimpleName()).register(registry));
    }
}

Not that this makes sense, so taking liberties:

try (
    //whatever code is required here to setup the Timer.Sample with all information required for the success case
    Timer.Sample sample = Timer.builder("my.timer").tag("outcome", "success").register(registry).start();) {
    try {
        // do some work that ultimately throws...
        throw new IllegalStateException("still timing");

        //no need to refer to the sample here anymore

    } catch (IllegalStateException e) {     
        sample.stopWithTags(timerBuilder.tags("outcome", "failure", "error", e.getClass().getSimpleName()));
    }
}
jkschneider commented 4 years ago

This last example has my interest I think. We'd need to do a little work on the Tags class to make tags overwrite existing ones, but frankly it should probably work that way anyway.

If your AutoCloseableObject is final and stays simple, the VM will not even allocate an extra object

An extra object for what?

RichMacDonald commented 4 years ago

An extra object for what?

The Timer.Sample instance. There won't be one.

If you declare the Timer.Sample class and fields as final, then even though the source code shows an instance being allocated in the try and deallocated via close in the finally, the VM will not actually create this instance. Instead, all the fields in the Timer.Sample will become local variables in the caller method, and the Timer.Sample methods will be inlined into the caller method.

There are caveats: The Timer.Sample methods need to be simple/small, and you need to run JITWatch to prove it worked.

P.S. Same optimization applies for the Timer.Builder. Write that using final code and it might also be inlined without allocating a Timer.Builder instance on the heap.

jkschneider commented 4 years ago

all the fields in the Timer.Sample will become local variables in the caller method

So fascinating, I've never heard this! Thanks for taking the time to explain. I guess let's figure out a way to get this done?