Open mhalbritter opened 1 year ago
Hi 😃
We don't want metrics for the actuator endpoints
I'm not sure it matters a lot but I think the issue on the Boot tracker is slightly different: it is not only about not having metrics for the actuator endpoints but not having Observations.
This logic is brittle and we're not happy with that.
FWIW, as the author of that PR, I'm not happy with that logic either. :D
We already have logic in place where we know exactly that an actuator endpoint has been hit: when handling it in the Spring Boot actuator infrastructure. What we like to have is a way to cancel an already started observation.
I guess this means that the logic you mentioned can only detect that an actuator endpoint was hit after the Observation is started (after the filter). I assume it is not possible to move that logic to the filter and skipping the creation of the Observation in the first place.
What we like to have is a way to cancel an already started observation.
I need to think this through but right now I think it means asking for a lot of trouble and making things very hard (impossible?) for handler implementations (potentially for users).
Right now if you start
/stop
an Observation, the corresponding handler methods (onStart
/onStop
) are immediately called. This means that every handler need to handle the situation when onStart
is already called and then the Observation
is cancelled. I think it is not safe to assume that this is always possible, here are a few problematic examples:
onStart
creates Span A, Observation B's onStart
creates Span B. Then Observation A is cancelled. What should we do with Span A and what should be the parent of Span B?onStart
is called, I don't think it is possible to roll that back when cancel happens.DefaultMeterObservationHandler
we start a LongTaskTimer
in the onStart
method. The purpose of LongTaskTimer
(maybe ActiveTaskTimer
would have been a better name) is to track tasks that are active so the recording must start in the onStart
method and after it is started, this recording is reported. Cancelling this is not really possible since the recording can be already reported between onStart
and cancel.onStart
method that might not have been possible to rollback when a cancellation signal arrives.Hey Jonatan,
I'm not sure it matters a lot but I think the issue on the Boot tracker is slightly different: it is not only about not having metrics for the actuator endpoints but not having Observations.
Yes, you're right. It's not only about metrics.
I guess this means that the logic you mentioned can only detect that an actuator endpoint was hit after the Observation is started (after the filter). I assume it is not possible to move that logic to the filter and skipping the creation of the Observation in the first place.
Yes, you're right. The thing which creates the Observations is the ServerHttpObservationFilter
from the Spring Framework, which runs for all endpoints and it creates the Observation before we know that it's an actuator endpoint.
For the cancelling propagation: If there would be such a feature, I think the API should provide methods for both only cancelling the current observation and one for also propagating the cancellation to the parent. In our actuator case, we would need a propagating cancellation: it should not only cancel the current observation, but also the one maybe created by spring security in their filter, etc. etc.
I totally agree that cancellation is not always possible without effects. For example if I start an observation, call a remote service and then cancel the Observation. It's not possible to cancel the propagation of the trace to the remote service and there's no way to notify the remote service that it has been cancelled afterwards.
Maybe the cancellation can only be provided on a best-effort basis.
I didn't realize that the DefaultMeterObservationHandler
creates LongTaskTimer
which already report when onStart
is called - this could make the whole cancellation idea fail for our actuator problem.
Hi :)
How about rather ServerHttpObservationFilter
to take a predicate to determine whether to create an observation?
Then, Boot auto-configuration can add a predicate not to create observations for actuator endpoints.
Adding cancellation as best effort sounds fragile and may introduce a hard to debug intermediate state.
As @mhalbritter described above, using a predicate has already been considered but we feel that it's too brittle. Observations are created before any routing has been performed by DispatcherServlet
. As a result, any predicate that wants to prevent creation based on that routing has to try to reproduce the routing logic. It's this that's brittle as the potential for subtle differences in the logic is high.
@wilkinsona Is it possible to somehow make that routing logic (or part of it) available to the Filter so that it can find out if an actuator endpoint was hit? (I assume not really but I'm not familiar with that routing at all.)
There is one more thing I was missing from the list above; I think this could not really work for spans either (ignoring the parent-child issue): the span must be started in the onStart
method, if a span is created it must be ended, if it is ended, it will be reported. I'm not aware of any way a span can be "cancelled" on the fly in any tracing library (except through a similar "filtering" mechanism that the ObservationPredicate provides).
I think if we introduce a "best effort cancel" functionality, we will still have some metrics and spans even if the Observation is cancelled (+ all the things that users can create and not able to rollback).
@wilkinsona
using a predicate has already been considered but we feel that it's too brittle.
Do you mean ObservationPredicate
?
Sorry, I meant a generic predicate that works for ServerHttpObservationFilter
which is a servlet filter.
Here is the sudo code of what I meant:
@Bean
@ConditionalOnMissingFilterBean
public FilterRegistrationBean<ServerHttpObservationFilter> webMvcObservationFilter(
// if we could get the HandlerMappings for actuator or get all and filter
..., List<HandlerMapping> actuatorHandlerMappings) {
...
Predicate<HttpServletRequest> predicate = (request) -> {
HandlerExecutionChain chain = getFirstMatchingHandlerMapping(actuatorHandlerMappings, request);
// If DispatcherServlet could reuse the matched actuator mapping
// to avoid look up the mappings again.
request.setAttribute("MATCHED_HANDLER", chain);
return chain == null;
};
ServerHttpObservationFilter filter = new ServerHttpObservationFilter(registry, convention, predicate);
FilterRegistrationBean<ServerHttpObservationFilter> registration = new FilterRegistrationBean<>(filter);
...
return registration;
}
As a result, any predicate that wants to prevent creation based on that routing has to try to reproduce the routing logic.
So, yes, this idea is also the same, shifting the handler mapping matching logic to the predicate implementation performed by the ServerHttpObservationFilter
.
Since the creation of the observation happened in the servlet filter, the decision to create the observation needs to happen before that if we want to control the observation creation...
The problem is not only the ServerHttpObservationFilter
but every component which creates observations until it hits the actuator endpoint. The way the PR tries solves this issue with an ObservationPredicate
works for ServerHttpObservationFilter
, but it won't work for custom filters which users install (or even maybe some filters like the various Spring Security ones).
If we want to suppress observations on certain conditions, I think we need a more general solution. Right now, we can suppress them when they get started, but sometimes you don't have enough information at that point to decide that.
Do you mean
ObservationPredicate
? Sorry, I meant a generic predicate that works forServerHttpObservationFilter
which is a servlet filter.
The exact type of the predicate doesn't matter as the problem's the same.
Since the creation of the observation happened in the servlet filter, the decision to create the observation needs to happen before that if we want to control the observation creation...
Exactly, and therein lies the problem.
this idea is also the same, shifting the handler mapping matching logic to the predicate implementation performed by the ServerHttpObservationFilter
This is what we don't want to do. It's brittle as Spring MVC hasn't really been designed with this in mind. It's slow as it results in the mapping being performed twice. It can also potentially yield the wrong answer as a subsequent filter may complete already the routing. For example, Jersey can be configured as a filter that runs in the filter chain with MVC's dispatcher servlet at the end. Jersey's filter may intercept the request and prevent it from ever reaching MVC.
@mhalbritter summarises the problem nicely:
Right now, we can suppress them when they get started, but sometimes you don't have enough information at that point to decide that.
While I'm sure it's not easy, I think Micrometer needs to solve this problem. Without such a general purpose solution, problems like https://github.com/spring-projects/spring-boot/issues/34801 either won't be addressed or they will be addressed in a variety of different ways that are brittle and not completely reliable.
While I'm sure it's not easy, I think Micrometer needs to solve this problem. Without such a general purpose solution, problems like https://github.com/spring-projects/spring-boot/issues/34801 either won't be addressed or they will be addressed in a variety of different ways that are brittle and not completely reliable.
I see a bigger problem cause metrics can't be cancelled, nor can be an OpenTelemetry span. You can cancel only a Brave span.
Why can't we ask the users to opt-in for URLs that they want to observe? That would be much easier, the Observation filter would be registered only for the provided URL patterns.
Why can't we ask the users to opt-in for URLs that they want to observe?
Various reasons are described above. I'll try to summarise.
If you try to make the decision up front, you either have to write a predicate that matches what the underlying web framework will do or you have to run the web framework's matching logic twice. Both are complicated by the possibility that you don't know which Framework will handle the request if you're using two (such as Spring MVC and Jersey) in the same app.
The only option that we've thought of thus far to avoid these problems is to allow the opt-out to be done retrospectively.
Isn't the same thing true for logging? Once one emits a log event, it is out of the hands of the "instrumentation". Similarly for metrics, once one recorded a value, it's out of the hands of the instrumentation.
We can add a cancel
method to the Observation interface but I have no idea how to implement it so that it will not cause more harm than good: data will be recorded and reported once start
is called.
I don't think logging's similar. A log event occurs at a specific point in time and once it has been emitted it doesn't change. By contrast, observations and some metrics cover a period of time. During that time more can be learned about the nature of the things that's being observed, such as precisely where an HTTP request will be routed. There's no equivalent with logging as everything's captured when the event's created so the need to cancel it as the nature of the event becomes clear doesn't arise.
Hi Andy,
Sorry for the late reply. I think if you are talking about the Observation as a whole, you are right. From the ObservationHandler
perspective though, events are when you start an observation or stop it or signal an error on it. These events (unlike an Observation) are instantaneous (~like log events?). We also have an .event
signal where you can attach arbitrary and instantaneous events to an Observation. None of these can be taken back once you signaled them.
I get your point about learning more about Observations as we go that's why you can attach key-values and events after you started an Observation, though we recommend attaching the key-values you know as early as possible but as you cannot "cancel" log.info
, observation.start
is not "cancellable" either.
Let me show two examples that might demo the issue better than I explained above:
ObservationTextPublisher
is a handler that emits log messages when you start/stop/signal an error on an observation. If the observation would be cancelled and stop
never called, we would end-up with incomplete logs (I admit, this might not be a huge problem alone).
DefaultMeterObservationHandler
starts a LongTaskTimer
in the onStop
method which is meant to track operations that are in progress. If the observation will be cancelled and stop will be never called that means the operation is still in progress which is not the case since the operation in reality did not even start. It is also a memory leak since LongTaskTimer
keeps the record as long as you stop the task.
Hello,
The use case is this issue on the Boot tracker: https://github.com/spring-projects/spring-boot/issues/34801 - TLDR: We don't want metrics for the actuator endpoints.
The way the metrics for the actuator endpoint get created is:
ServerObservationFilter
->DispatcherServlet
-> MVC Handler mapping -> Spring Boot actuator infrastructure -> actuator methodWe tried to solve this by using an
ObservationPredicate
and then extract the URL from the observation context, and then check if this is an actuator endpoint by using string matching on the path (see https://github.com/spring-projects/spring-boot/pull/36455).This logic is brittle and we're not happy with that. We already have logic in place where we know exactly that an actuator endpoint has been hit: when handling it in the Spring Boot actuator infrastructure. What we like to have is a way to cancel an already started observation.
Ideally, there would be some way to signal that this observation is cancelled, so that it doesn't have any effects (don't create timers, cancel running ones, etc.).
With that in place, we could add a feature to the Spring Boot actuator infrastructure that would take the current observation and cancel it. We wouldn't have to fiddle around with context extraction, string matching and so forth.
We should also think about how that cancellation propagates to the parent observations.
What do you think?