Open jeffreyscarpenter opened 1 month ago
Confirmed when requests to /metrics say they accept compression it is returned without
This metric may cause a high cardinality problem, so I got two suggestions from Frank Rosner:
le="+Inf" That can be problematic. Prom histograms are known to be inefficient. You can read about it here: https://github.com/riptano/cndb/issues/8466 We're trying to use the victoriametrics specific histogram implementation which looks promising in terms of reducing storage cost and other things.
Please consider supporting gzip encoding for larger scrapes (>2 MB) in your endpoint. vmagent will send the corresponding Accept-Encoding header automatically. Maybe it's as simple as quarkus.http.enable-compression=true?
After some investigations for these two suggestions, I found:
Enable gzip is not as simple as quarkus.http.enable-compression=true
2.1 While Quarkus provides configuration options such as quarkus.http.enable-compression=true
to support response body compression, this setting only applies to application endpoints. It doesn’t seem to work with the Micrometer-registered custom route /metrics
. I’ve tried various related configurations with no success. Additionally, I found a relevant discussion in the Quarkus repo(https://github.com/quarkusio/quarkus/issues/26112), where someone noted: In Quarkus 2.9.2.Final, I also observed that the /q/metrics endpoint (from quarkus-smallrye-metrics) response is no longer compressed.
It seems that at some point, compression for the /metrics endpoint stopped working after an update.
2.2 Micrometer itself doesn’t provide built-in support for compressing metrics responses, so there’s no direct configuration available to handle this.
2.3 I also attempted to use HTTP filters to intercept requests to the /metrics
endpoint, but they didn’t work either. Another discussion I found (https://github.com/quarkusio/quarkus/discussions/32671) mentioned that filters can’t intercept non-application endpoints. One proposed solution is to bypass the built-in /metrics
endpoint and handle it manually by scraping the metrics, compressing them, and returning the compressed response.
In PR #1029 we removed the histograms from the
command_processor_process
metrics because of the extremely high cardinality that was causing problems for Grafana dashboards and metrics forwarding.The high cardinality was caused by the multiplicity of tags we supported including
command
,tenant
,sort type
,error
,error class
,error_code
, andvector_enabled
. When adding histogram buckets to this, it potentially results in a huge number of series.However it would still be useful to be able to track latency by command for debugging purposes. We should add a new metric called
command_processor_latency
that is a histogram metric tagged bycommand
andtenant
only.