Open edoakes opened 1 week ago
+1 to this. I think apart from worsening the UX, for users than ingest the metrics into other providers rather than hosting a Prometheus server, high cardinality metrics can blow up costs, e.g. https://docs.datadoghq.com/account_management/billing/custom_metrics/?tab=countrate
Ok, I did some prototyping here. We have metrics containing the route in two places: the proxy and the replica.
In the proxy, we don't have access to application-defined routes (by design) so we can't do anything too clever. We could try to do something like auto-detect the cardinality and cap the number of tags, but that seems excessively complex.
In the replica, we do have access to the underlying ASGI app which we can use to identify the matched route string (e.g., /path/{wildcard}
.
So I'd propose that we:
route_prefix
under the existing route
tag.route_prefix
for applications that use the raw ASGI request.We could consider changing the metric tag for the proxy metrics to route_prefix
for clarity, but that introduces a migration for what seems to me like a very minor improvement.
In some of our metrics, we include the HTTP route as a tag. If users include data with high cardinality in their HTTP requests, such as a per-user ID, this blows up the prometheus metrics (and can render the metrics unusable).
We need to reduce the cardinality here, perhaps by only exporting the Serve-level
route_prefix
instead of the full route.