Closed garfieldnate closed 4 years ago
I could see adding an overload for Timer.start(Timer)
that returns a subclass or specialization of Timer.Sample
that is AutoCloseable
. Not happy with the inability in Java to handle exceptions functionally with generics, and this does seem like a better approach.
I think I'm recalling why we didn't do this before when I looked at this kind of approach earlier. When you time blocks that can throw exceptions in almost every case you want to add some sort of outcome
tag so that you can dimensionally drill down on failures vs. successes. I think the Prometheus API in this case encourages you down a path that neglects differentiating success from failure, which you will later regret. I'm open to hearing an argument for ignoring it though.
AutoCloseable
just doesn't provide any means to retrieve the exception on close. If it did, something like this might be possible:try (Timer.AutoCloseableSample s = Timer.start(registry, timer, (e, t) -> t.tag("error", e.getClass().getSimpleName()))) {
// do some work that ultimately throws...
throw new IllegalStateException("still timing");
} catch (IllegalStateException e) {
// do nothing
}
try (Timer.AutoCloseableSample s = Timer.start(registry, timer)) {
// do some work that ultimately throws...
throw new IllegalStateException("still timing");
} catch (IllegalStateException e) {
s.tag("error", e.getClass().getSimpleName());
}
In the end, this really isn't that much more verbose, and I think is the best we can do:
Timer.Sample sample = Timer.start();
Timer.Builder timerBuilder = Timer.builder("my.timer");
try {
// do some work that ultimately throws...
throw new IllegalStateException("still timing");
sample.stop(timerBuilder.tag("outcome", "success").register(registry));
} catch (IllegalStateException e) {
sample.stop(timerBuilder.tags("outcome", "failure", "error", e.getClass().getSimpleName()).register(registry));
}
Interesting. I'd never thought about error tags in the original Timer. I always add a second Counter metric to count the exceptions. A mindset difference caused by working with systems that do not support tags, e.g., graphite.
I use a hand-rolled AutoCloseableSample. And because I also have concerns about the GC ramifications of many short-lived Sample objects, I use a cache system (A ThreadLocalFactory) to provide them. So I have
try (AutoCloseableSample s = Helper.start(timer) { // do some work that ultimately throws... throw new IllegalStateException("still timing"); } catch (IllegalStateException e) { errorCounter.increment(); }
Separate Counters for exceptions makes some sense when you have alarm systems monitoring for them. But either way works
GC ramifications of many short-lived Sample objects
These should be really really cheap, and if the timing is happening at a reasonable level of code, I would think the heap impact of Sample
is dwarfed by what is being timed around it. Metrics timers aren't really typically something done in tight loops.
Separate Counters for exceptions makes some sense
I think it's crucial to differentiate by tag on the timer because when you're monitoring something like 99th percentile latency you should consider how different failure modes lead to much different latencies.
When you don't distinguish these in timer tags, all these disparate latency characteristics get munged into one thing.
I really appreciated the discussion on this, and open to more ideas, but for now I don't see a way to make Timer.Sample
auto-closeable and not discourage tagging. It may be different than the Prometheus client, but I guess that's not always bad?
GC ramifications of many short-lived Sample objects
I've since become more educated, specifically from https://vimeo.com/181925278 and testing with JITWatch. If your AutoCloseableObject is final and stays simple, the VM will not even allocate an extra object, so there is literally no GC penalty at all. The impact is the same as if the AutoCloseableObject fields were defined as local fields in your method.
Holding my nose:
try (
Timer.Sample sample = Timer.start();
Timer.Builder timerBuilder = Timer.builder("my.timer");) {
try {
// do some work that ultimately throws...
throw new IllegalStateException("still timing");
sample.stop(timerBuilder.tag("outcome", "success").register(registry));
} catch (IllegalStateException e) {
sample.stop(timerBuilder.tags("outcome", "failure", "error", e.getClass().getSimpleName()).register(registry));
}
}
Not that this makes sense, so taking liberties:
try (
//whatever code is required here to setup the Timer.Sample with all information required for the success case
Timer.Sample sample = Timer.builder("my.timer").tag("outcome", "success").register(registry).start();) {
try {
// do some work that ultimately throws...
throw new IllegalStateException("still timing");
//no need to refer to the sample here anymore
} catch (IllegalStateException e) {
sample.stopWithTags(timerBuilder.tags("outcome", "failure", "error", e.getClass().getSimpleName()));
}
}
This last example has my interest I think. We'd need to do a little work on the Tags
class to make tags overwrite existing ones, but frankly it should probably work that way anyway.
If your AutoCloseableObject is final and stays simple, the VM will not even allocate an extra object
An extra object for what?
An extra object for what?
The Timer.Sample instance. There won't be one.
If you declare the Timer.Sample class and fields as final, then even though the source code shows an instance being allocated in the try and deallocated via close in the finally, the VM will not actually create this instance. Instead, all the fields in the Timer.Sample will become local variables in the caller method, and the Timer.Sample methods will be inlined into the caller method.
There are caveats: The Timer.Sample methods need to be simple/small, and you need to run JITWatch to prove it worked.
P.S. Same optimization applies for the Timer.Builder. Write that using final code and it might also be inlined without allocating a Timer.Builder instance on the heap.
all the fields in the Timer.Sample will become local variables in the caller method
So fascinating, I've never heard this! Thanks for taking the time to explain. I guess let's figure out a way to get this done?
I'm migrating from the prometheus API and changing the Timer code is quite cumbersome. In Prometheus, I could do the following:
However, Micrometer's Timer objects are not Autocloseable. The methods I see in Timer,
record()
andrecordCallable()
, take lambda functions, but lambda functions do not allow throwing checked exceptions, leading to lots of annoying boilerplate (I'm following section 3.1 from this Baeldung article).recordCallable()
explicitly declares the caller as throwingException
, which is also not good because now the code I write has to handle the generalException
instead of my specific exception class.Because I really prefer the autocloseable usage, I ended up writing a little wrapper class:
And I use it like this:
It would be much easier for me if the I could call
myTimer.start()
and that would return an autocloseable class similar to Timer.Sample, and then theclose()
method would callTimer.Sample.stop()
. We can't just change Timer.Sample to be autocloseable, unfortunately, because it contains no reference to the timer metric to write the results to.