spring-cloud / spring-cloud-netflix

Integration with Netflix OSS components
http://cloud.spring.io/spring-cloud-netflix/
Apache License 2.0
4.86k stars 2.44k forks source link

Client observations creating too many uri tag values #4290

Open sunruh opened 1 month ago

sunruh commented 1 month ago

After observability support was added to RestTemplateEurekaHttpClient in #4255 by using the RestTemplateBuilder metrics are being generated for Eureka client requests.

The ClientRequestObservationConvention adds the URI template of the requests as a low cardinality observability/metric key-value/tag.

The current implementations of EurekaHttpClient based on RestTemplate and WebClient assemble the full URL of the requsts and pass it to the client implementation. This causes a high number of values for the URI tag in the metrics, especially if the lastDirtyTimestamp changes often (due to e.g. flaky health of some component). By default only 100 values are allowed for the URI tag, additional values would be dropped and the metrics using these values are ignored.

Could be mitigated by passing at least the lastDirtyTimestamp as a uriVariable to the RestTemplate calls.

Version: spring-cloud-netflix-eureka-client 4.1.2

Sample of generated (prometheus) metrics

The following metrics where available after making the Eureka server unavailable twice and thus causing a status change to UNKNOWNand dirying of the InstanceInfo.

# HELP http_client_requests_seconds_max  
# TYPE http_client_requests_seconds_max gauge
http_client_requests_seconds_max{client_name="localhost",error="ResourceAccessException",exception="ResourceAccessException",method="GET",outcome="UNKNOWN",status="CLIENT_ERROR",uri="/eureka/apps/delta"} 0.0410048
http_client_requests_seconds_max{client_name="localhost",error="ResourceAccessException",exception="ResourceAccessException",method="PUT",outcome="UNKNOWN",status="CLIENT_ERROR",uri="/eureka/apps/<appName>/<instanceId>?status=UP&lastDirtyTimestamp=1720440338928"} 0.0
http_client_requests_seconds_max{client_name="localhost",error="ResourceAccessException",exception="ResourceAccessException",method="PUT",outcome="UNKNOWN",status="CLIENT_ERROR",uri="/eureka/apps/<appName>/<instanceId>?status=UP&lastDirtyTimestamp=1720441059188"} 0.0409643
http_client_requests_seconds_max{client_name="localhost",error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/eureka/apps/"} 0.0054987
http_client_requests_seconds_max{client_name="localhost",error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/eureka/apps/delta"} 0.0476062
http_client_requests_seconds_max{client_name="localhost",error="none",exception="none",method="POST",outcome="SUCCESS",status="204",uri="/eureka/apps/<appName>"} 0.0078605
http_client_requests_seconds_max{client_name="localhost",error="none",exception="none",method="PUT",outcome="CLIENT_ERROR",status="404",uri="/eureka/apps/<appName>/<instanceId>?status=UP&lastDirtyTimestamp=1720440338928"} 0.0
http_client_requests_seconds_max{client_name="localhost",error="none",exception="none",method="PUT",outcome="CLIENT_ERROR",status="404",uri="/eureka/apps/<appName>/<instanceId>?status=UP&lastDirtyTimestamp=1720441059188"} 0.0194635
http_client_requests_seconds_max{client_name="localhost",error="none",exception="none",method="PUT",outcome="SUCCESS",status="200",uri="/eureka/apps/<appName>/<instanceId>?status=DOWN&lastDirtyTimestamp=1720440308847"} 0.0
http_client_requests_seconds_max{client_name="localhost",error="none",exception="none",method="PUT",outcome="SUCCESS",status="200",uri="/eureka/apps/<appName>/<instanceId>?status=UP&lastDirtyTimestamp=1720440338928"} 0.0
http_client_requests_seconds_max{client_name="localhost",error="none",exception="none",method="PUT",outcome="SUCCESS",status="200",uri="/eureka/apps/<appName>/<instanceId>?status=UP&lastDirtyTimestamp=1720441059188"} 0.0066927
http_client_requests_seconds_max{client_name="localhost",error="none",exception="none",method="PUT",outcome="SUCCESS",status="200",uri="/eureka/apps/<appName>/<instanceId>?status=UP&lastDirtyTimestamp=1720442649723"} 0.0058572
OlgaMaciaszek commented 1 month ago

Hello @sunruh, thanks for reporting the issue. Makes sense. I don't think we want to modify Eureka's REST API at this point, but we may want to create our own implementation of ClientRequestObservationConvention that will disregard this query param and set it on the RestTemplate once it's been built. @jonatan-ivanov wdyt?

jonatan-ivanov commented 1 month ago

@sunruh As a quick workaround till this is fixed and released, you can create a MeterFilter or an ObservationFilter @Bean in your application and remove the high cardinality part of the uri. The uri should be templated and low cardinality as you stated above. I think if you can inject to your own RestTemplate to the Eureka client, then you should also be able to inject your own convention where you can modify this behavior (but I think the simplest thing to do is creating a MeterFilter).

@OlgaMaciaszek I think creating a ~EurekaClientRequestObservationConvention that extends ClientRequestObservationConvention and normalizes the value of the uri so it will be low cardinality is the proper solution. I think I would also make this convention extendable (non-final) and injectable by the users so that users can define their own convention if they want to.

OlgaMaciaszek commented 1 month ago

Thanks, @jonatan-ivanov, @sunruh. I've started working on it just to realise that this issue should already be resolved as a side effect of a different fix that I just merged yesterday. It switches from using exchange(String url, ...) to exchange(URI url, ...) in RestTemplateEurekaClient , hence removing the use of uriTemplate and the low cardinality uri key in favour of the high cardinality http.url key. Could you please verify against the latest snapshots?

sunruh commented 1 month ago

That fixes the issue. Of course all requests now end up with uri=UNKNOWN but it would be impossible to have an uri tag with any RestTemplate method that takes a URI as those are never templates:

# HELP http_client_requests_seconds_max  
# TYPE http_client_requests_seconds_max gauge
http_client_requests_seconds_max{client_name="localhost",error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="none"} 0.0037342
http_client_requests_seconds_max{client_name="localhost",error="none",exception="none",method="POST",outcome="SUCCESS",status="204",uri="none"} 0.0
http_client_requests_seconds_max{client_name="localhost",error="none",exception="none",method="PUT",outcome="SUCCESS",status="200",uri="none"} 0.0073595

Both issues (this one and the one regarding special characters in e.g. the instance id) could have been solved by using templating in the RestTemplateEurekaHttpClient, not changing the API but using e.g. in getApplication(String appName), restTemplate.exchange(serviceUrl + "apps/{appName}", HttpMethod.GET, null, Application.class, appName) but I would consider that an improvement.

OlgaMaciaszek commented 1 month ago

@sunruh would you like to create a PR with the improvement?

spring-cloud-issues commented 1 month ago

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

spring-cloud-issues commented 1 month ago

Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open the issue.

OlgaMaciaszek commented 1 month ago

Reopening as enhancement.

sunruh commented 3 weeks ago

Was my PR maybe missed as the status is now "help wanted"?

OlgaMaciaszek commented 3 weeks ago

Thanks @sunruh, I think the status was changed before your PR or roughly at the same time. Will look at it next week.

Chessray commented 2 weeks ago

Hi all, I stumbled on this issue just now; and I'm sorry if it is already resolved (quite frankly, it's a late Friday night for me, and I'm not fully able to follow the conversation anymore). In any case, while searching for solutions I stumbled on this article which presents a way of using WebClient that will still create a tag with the uriTemplate, but respect the parameters as such: https://medium.com/@nickjarzembowski_15160/controlling-webclient-metrics-cardinality-b96669dcc88d

Just thought it might be helpful in this situation. If this issue is already been taken care of or obsolete, please disregard this comment.