Open nyh opened 2 years ago
Another waste of both -- space and CPU cycles to render -- is histograms.
cat metrics | fgrep scylla_storage_proxy_coordinator_write_latency_bucket | fgrep 'shard="0"' | fgrep 'scheduling_group_name="main"' | wc -c -l
65 7511
It takes 7.5k to report a single histogram for one class on one shard.
Currently, Seastar exports metrics using the Prometheus protocol. This is an extremely wasteful textual protocol, which repeats long variable names and user-readable help strings again and again, and also sends things like numbers in wasteful textual formats. Here is a tiny excerpt from the Prometheus output of the Scylla project:
Note how this used 375 bytes for just two numbers.
Since Prometheus do not seem to have any plans to improve their protocol (and in fact dropped the more efficient protobuf protocol they used to have), I propose that we invent our own protocol (perhaps based on ideas we find in other similar projects). This protocol can for example send numbers in binary format (which is more efficient for our server), can "intern" (https://en.wikipedia.org/wiki/String_interning) various help strings and variable names so they don't need to be sent more than once over the same HTTP connection.
Since Prometheus will not know how to read our new protocol and many Seastar users will still like to continue using Prometheus, we can implement a Prometheus exporter (https://prometheus.io/docs/instrumenting/exporters/) that knows how to read it, or if for some reason this isn't feasable (?) it can even be a simple reverse-proxy on the Prometheus machine that can read our protocol and write it (as an internal socket) as the old inefficient Prometheus protocol.