Open juergencodes opened 8 months ago
Thanks for the report. I'm surprised other users didn't report this before when I asked for verification of the fix in https://github.com/micrometer-metrics/micrometer/issues/4353. Regardless, this is the kind of thing that could have been caught if we had an integration test setup with Stackdriver. I'll look into the feasibility of getting that setup.
Close methods of PushMeterRegistry and StepMeterRegistry should be fixed in a way that the publish method will get called only once.
We probably would not change those classes to fix a limitation that only affects Stackdriver. The StackdriverMeterRegistry could have its own implementation of close
that does not delegate to those parent classes. I'm not immediately sure what it would look like, though.
Moreoever, there should be a new config parameter in StackdriverConfig to specify the minimum time between two calls to its publish method. In case there is a call before the configured amount of time has passed, the method should sleep before doing the publish.
I prefer to not make things configurable unless they need to be and there is enough benefit. It's a global Stackdriver limitation that reported time series can't be within a certain time of each other, right? If so, the minimum time is fixed and doesn't need to be configurable. That said, I don't especially like the idea of sleeping while something is waiting for the registry to close, but if it is short enough and there isn't a better way to solve this, it may be unavoidable. What is the minimum time difference required?
We probably would not change those classes to fix a limitation that only affects Stackdriver. The StackdriverMeterRegistry could have its own implementation of
close
that does not delegate to those parent classes. I'm not immediately sure what it would look like, though.
That's fair enough. My thought was most targeting towards if this double publish could potentially lead to issues for other technology adapters that I of course do not overview. I'm a bit skeptical with respect to an own implementation of close
in StackdriverMeterRegistry, because you would again have to deal with the things that the super classes do, including the stuff in MeterRegistry. To my understanding, it would then make sense to follow only the path that StackdriverMeterRegistry makes sure to avoid time series with too little distance - no matter if it resulted from the two publish
while closing, timing issues as described above or any other reason. I will change the title of the issue likewise.
Moreoever, there should be a new config parameter in StackdriverConfig to specify the minimum time between two calls to its publish method. In case there is a call before the configured amount of time has passed, the method should sleep before doing the publish.
I prefer to not make things configurable unless they need to be and there is enough benefit. It's a global Stackdriver limitation that reported time series can't be within a certain time of each other, right? If so, the minimum time is fixed and doesn't need to be configurable. That said, I don't especially like the idea of sleeping while something is waiting for the registry to close, but if it is short enough and there isn't a better way to solve this, it may be unavoidable. What is the minimum time difference required?
I totally second to avoid unnecessary config options. However, as far as I know, the minimum time is configurable in GCP. For our setup I got the value of 5 s. I will talk with the team later to see if we have more meaningful information on this.
PS: While trying workarounds for this issue, we of course thought of subclassing StackdriverMeterRegistry to add the sleep logic in an override implementation. However, this does not work for the fact that StackdriverMeterRegistry most important constructor is private and available to the builder only. Hence, we were not able to instantiate it with MetricServiceSettings. In case you do not see anything that contradicts making the second constructor protected - or provide another one with the MetricServiceSettings, this would simplify working around future issues.
Describe the bug This is a follow up of https://github.com/micrometer-metrics/micrometer/issues/4353. We tried the new version 1.12.3, for which the close now happens in a proper way. However, this revealed a new problem: When StackdriverMeterRegistry is closed, there is an error from GCP that complains about two or more time series with timestamps too close to each other.
We suspect that this is caused by following reasons:
Environment
To Reproduce Simply using StackdriverMeterRegistry will cause the issue.
Expected behavior Close methods of PushMeterRegistry and StepMeterRegistry should be fixed in a way that the publish method will get called only once. Moreoever, there should be a new config parameter in StackdriverConfig to specify the minimum time between two calls to its publish method. In case there is a call before the configured amount of time has passed, the method should sleep before doing the publish. Probably, this must happen before the local class Batch is instantiated, as this actually sets the timestamp of the time series.
Additional context This problem only appeared after https://github.com/micrometer-metrics/micrometer/issues/4353 was fixed.