scylladb / seastar

High performance server-side application framework
http://seastar.io
Apache License 2.0
8.37k stars 1.55k forks source link

Propose our own metric-fetching protocol to replace Prometheus #1021

Open nyh opened 2 years ago

nyh commented 2 years ago

Currently, Seastar exports metrics using the Prometheus protocol. This is an extremely wasteful textual protocol, which repeats long variable names and user-readable help strings again and again, and also sends things like numbers in wasteful textual formats. Here is a tiny excerpt from the Prometheus output of the Scylla project:

# HELP scylla_alien_receive_batch_queue_length Current receive batch queue length
# TYPE scylla_alien_receive_batch_queue_length gauge
scylla_alien_receive_batch_queue_length{shard="0"} 0.000000
# HELP scylla_alien_total_received_messages Total number of received messages
# TYPE scylla_alien_total_received_messages counter
scylla_alien_total_received_messages{shard="0"} 0

Note how this used 375 bytes for just two numbers.

Since Prometheus do not seem to have any plans to improve their protocol (and in fact dropped the more efficient protobuf protocol they used to have), I propose that we invent our own protocol (perhaps based on ideas we find in other similar projects). This protocol can for example send numbers in binary format (which is more efficient for our server), can "intern" (https://en.wikipedia.org/wiki/String_interning) various help strings and variable names so they don't need to be sent more than once over the same HTTP connection.

Since Prometheus will not know how to read our new protocol and many Seastar users will still like to continue using Prometheus, we can implement a Prometheus exporter (https://prometheus.io/docs/instrumenting/exporters/) that knows how to read it, or if for some reason this isn't feasable (?) it can even be a simple reverse-proxy on the Prometheus machine that can read our protocol and write it (as an internal socket) as the old inefficient Prometheus protocol.

xemul commented 2 years ago

Another waste of both -- space and CPU cycles to render -- is histograms.

cat metrics  | fgrep scylla_storage_proxy_coordinator_write_latency_bucket | fgrep 'shard="0"' | fgrep 'scheduling_group_name="main"' | wc -c -l
     65    7511

It takes 7.5k to report a single histogram for one class on one shard.