Prerequisites

This only works for pyroscope v2.
I've used an old version of the compactor, before merging https://github.com/grafana/pyroscope/pull/3694

Exporting profile metrics at compaction time

This PoC shows how could be export metrics from profiles at compaction time (in fact we do this right after compaction, not at compaction time).

Compaction is something that happens eventually in every block of our object storage. This approach offers some benefits over exporting at ingestion time, as described by tempo's members:

At ingestion time, two different write operations in the write path (profiles writes + metrics writes) makes error handling difficult - i.e. having to revert the first write after the second failed. Doing this at compaction time may let you write profiles first and handle metrics later.
At ingestion time, we would need a new stateful component, a WAL or state holder for exporting. When we do it at compaction time, data is already in object storage (our de facto WAL) when we export metrics of it.
At ingestion time, OOO samples need to be handled differently (discarding them or using current timestamp instead of real), while when we do this at compaction time, we accept some range of OOO: we will be exporting data of a wider time range.

In theory, the first level of compaction (L0 blocks to L1 blocks) is done shortly after the data ingestion (~10s). But in practice, I've observed that L0 compaction happens every 30-120s. I don't know the reasons of such delay (maybe data ingestion is low and compaction happen less often? I only ingest data of 1 tenant with 2 services - every 15s aprox)

Generated metrics

Now that we have a prototype running, we can get a picture of how generated metrics look like.

Dimensions

Every profile type or dimension is exported as a metric with this format:

pyroscope_exported_metrics_<profile_type>{...}

So for example, if a service writes profile data of 3 different __profile_type__, we will export 3 different metrics:

process_cpu:cpu:nanoseconds:cpu:nanoseconds
- exported as pyroscope_exported_metrics_process_cpu_cpu_nanoseconds_cpu_nanoseconds
memory:alloc_objects:count:space:bytes
- exported as pyroscope_exported_metrics_memory_alloc_objects_count_space_bytes
memory:alloc_space:bytes:space:bytes
- exported as pyroscope_exported_metrics_memory_alloc_space_bytes_space_bytes

Labels are preserved, unrolling new series for each labelset. So we can query for CPU of a specific pod of a service and some other pprof label like this:

pyroscope_exported_metrics_process_cpu_cpu_nanoseconds_cpu_nanoseconds{service_name="my-service", 
pod="my-pod", my_pprof_label="some-value"}

Dimensions metrics are exported for every tenant and every service_name, but this should be configurable by the user.

Functions

This prototype explores also the ability to export metrics on specific functions. We can chose an interesting function to export.

Now it's exporting data for every dimension of the given function under this format:

pyroscope_exported_metrics_functions_<profile_type>{function="exported-function", ...}

In this prototype I've hardcoded Garbage colector and HTTP functions to export, for every service_name. I haven't make distinction on tenant yet. The functions to export should come from config (UI is a must here).

In the future, we could specify a filter of LabelSets instead of exporting by service_name. So for example "foo": "{}" would export every profile of foo function. And "foo": "{service_name=\"my-service\", vehicle=\"bike\"}" would export only for that service_name and vehicle.

Detected challenges

This naive solution is full of trade-offs and assumptions and it's far from being a final solution. I've detected some challenges:

Exporting: we need to export data somewhere, and this should be configurable. I'm not sure how it works for grafana cloud but we may need to grab credentials from grafana-com.
When to do it: I don't think exporting in the compaction worker at the end of every L0 compaction is the correct place to do this. We could do it as a background task triggered by the metadata manager every time a L1 block has been created.
- Edit: @kolesnikovae proposes to do it while compacting, not with a query after compaction happened (my approach). Yet to explore how to work that out.
Error handling: I haven't done any error handling here. This naive solution is a best effort at compaction time, but error handling can be tricky here: what happens if some L1 blocks aren't successfully processed. Can we stop L1->L2 compaction (done after 100s) until we retry metrics exporting. Can we do it later on L2 blocks? if so, how do we identify non-exported data inside the compacted L2 blocks?
Handling functions metrics is inefficient: we don't have index for functions to find them quickly

DEMO

I have a pyroscope with the changes running in my machine while exporting metrics to my grafana cloud instance.

Go grant yourself privileges in the admin page:

You can take a look on exported metrics here: https://albertosotogcp.grafana.net/explore/metrics/trail?from=now-1h&to=now&timezone=browser&var-ds=grafanacloud-prom&var-otel_resources=&var-filters=&var-deployment_environment=&metricSearch=pyroscope_&metricPrefix=all
You can see a demo dashboard, were I tried to simulate an alert of >20% of CPU of garbage collection or >60% of memory in HTTP requests: https://albertosotogcp.grafana.net/goto/eFRA8E7NR?orgId=1

grafana / pyroscope

POC: export profile metrics at compaction time #3718