Open bmatican opened 11 months ago
Okay, @bogdan wanted to know how many time series we have from handlerlatency metrics. I ran the following Prometheus query: count by (quantile) ( label_replace( {savedname =~ "handler.", exported_instance = "yb-dev-mlillibridge-core2-3000-tbl-30pct-8123044812968211792-n1"}, "quantile", "$1", "savedname", ".*(sum|count)" ) ) which counts how many time series there are of this type for each quantile/sum/count. (The fancy label_replace part converts the sum and count parts into fake quantiles.)
Running against a 3000 tablet box after running some extensive sysbench stress tests gave: summing (just take out the "by (quantile)" part) gives 3,352 timeseries
count ( {savedname =~ "(proxy|service)(request|response)_.*", exported_instance = "yb-dev-mlillibridge-core2-3000-tbl-30pct-8123044812968211792-n1"} ) gives 1,538 timeseries on this box...
Jira Link: DB-8860
Description
Discussed internally. Right now, all our auto-generated RPC metrics will generate metrics like the following
However, for the vast majority, 6/8, the quantiles, are not necessary for YBA or YBM. We should get rid of those, as they needlessly increase the total number of metrics each node exports. Currently, we would only need to retain them for the top level YSQL/YCQL/YEDIS operations.
To keep the quantiles for the metrics we do want, it would be nice if we had a way to tag the relevant RPC methods. One interesting way could be a custom protobuf option (see an example in Kudu: https://github.com/apache/kudu/commit/cef7b10239a1cf860bfcb526d503b07503442a49). This would allow us to assume only RPC methods that are tagged, require them, making it a very explicit dev choice, that's cleanly documented in the .proto files themselves.
cc @es1024 @yusong-yan
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information