Open coreyoconnor opened 1 week ago
@coreyoconnor Please sign the Contributor License Agreement!
Click here to manually synchronize the status of this Pull Request.
See the FAQ for frequently asked questions.
@coreyoconnor Thank you for signing the Contributor License Agreement!
I think there might be a misunderstanding since Timer
s does not really have a time unit from user's perspective.
(They have a unit under the hood: always nanoseconds but that is not available for the users, they don't need to know about this internal detail).
The expectation is that a recorded value is converted to the unit of the registry before publication. When you record a value using a timer and a unit, the unit you used for the recording has nothing to do with the unit used for publishing. (Btw we don't recommend measuring and recording duration like that, but we recommend using Timer.Sample
, see the docs.)
This means that if the unit of the registry is milliseconds, it does not matter if you record in seconds, millis or micros, everything will be converted and published in millis. StatsD happens to use milliseconds by default: https://github.com/micrometer-metrics/micrometer/blob/4d9efae6b7f3635be5e99fa089eb2851f9d9ca78/implementations/micrometer-registry-statsd/src/main/java/io/micrometer/statsd/StatsdMeterRegistry.java#L464-L467
So using your example, if you do this:
timer.record(2_000, TimeUnit.SECONDS); // about 30 min
It will be (recorded as nanos under the hood and) published in the unit of the registry. Since StatsD uses milliseconds as the unit, you will see 2,000,000ms
when published.
This is the expected behavior which is also guarded by tests and that's why that your change in AbstractTimer
fails the tests. If you want to override the time unit of a registry, you can override the getBaseTimeUnit
method I linked above but based on your diff it seems that millis is hardcoded in a few places in case of StatsD which we can fix but is there any backend that can use StatsD with a non-milliseconds unit?
Starting from the end as that will explain things:
If you want to override the time unit of a registry, you can override the getBaseTimeUnit method I linked above but based on your diff it seems that millis is hardcoded in a few places in case of StatsD which we can fix
indeed! I am using with getBaseTimeUnit
overriden in the registery. Which I would expect to override (ha!) the time unit used. As you can tell: no such luck.
But what backend is this for? Datadog
There are several other issues where users have encountered the same unexpected behavior from datadog. I can pull these up later.
Specifically: Datadog does not support units for timers. Timers are unit-less gauges.
So, setting the units has no effect - datadog is going to accept them as unit-less and presumes you've pre-scaled them to whatever you want or manually changed the metric config in the UI.
Obviously, changing the metric config in datadog for every timer after they've been submitted is a non-starter.
So what to do? Seems to me like overriding the unit should do what is documented as: Overrides the unit used. "The base time unit of the timer to which all published metrics will be scaled". Which, if it was true, would do what is expected for datadog.
So even if there is no room for adjusting micrometer to scale the values nicely: The documentation is a bit misleading no?
As I said above, it seems that "millis" is hardcoded in a few places in case of StatsD which we can fix. Please remove your change from AbstractTimer
and replace hardcoded TimeUnit.MILLISECONDS
to baseTimeUnit()
. I think doing that should make you able to override the base unit. (I can help to write a tests for this use-case.)
Could you please also run ./gradlew format
and ./gradlew build
to see if your changes break the build or not? (Right now they do.)
The documentation is a bit misleading no?
If I understand this correctly, I don't think the documentation is misleading, I think the docs are ok but we have a bug: I agree with the intention of the docs but the current behavior does not follow that. :) After fixing the issue, and you still think the docs are misleading, please open an issue/PR.
Yes, certainly: I will resolve the tests. As well as the other issues. Where would be good to document this aspect of datadog? At least enough to point somebody in the right direction. The statsd registry or config?
As you mentioned statsd works in milliseconds. So, Datadog is an exceptional case then? I'm not familiar with the standards but sounds like it! In that case, should this be a datadog specific config somehow? Or is the override of the base time unit sufficient to capture the exceptional aspect?
for reference, the related motivating tickets.
These are duplicates - just adding here for completeness
discussion about the the unitless of timers:
Draft: Opening for discussion with (incomplete) proposed changes.
There is a
baseTimeUnit
forTimer
s that appears to be inconsistently used. Given the definition "The base time unit of the timer to which all published metrics will be scaled" I'd expect that a value, eg 2000 sec, reported as the value 2000 with the unit seconds would be emitted to statsd as the value in the base time units, 2000 in this example. However, the actual value is 2,000,000. Which is the corresponding millisecond value for the seconds expected to be recorded.As that explainer above is a challenge to read: I've included some (untested or even compiled) changes to illustrate.