grafana / pyroscope

Continuous Profiling Platform. Debug performance issues down to a single line of code
https://grafana.com/oss/pyroscope/
GNU Affero General Public License v3.0
9.9k stars 586 forks source link

Symbolic information binary format #2926

Closed kolesnikovae closed 2 months ago

kolesnikovae commented 8 months ago

Pyroscope stores symbolic information such as locations, functions, mappings, and strings in column-major order, in parquet format. We define schema dynamically, and have hand-written costruct/deconstruct procedures for each of the models. While it gives us a simple and convenient way to manage and maintain the storage schema, the approach has its own disadvantages:

  1. We always read all the model fields/columns. In the meantime, read/write buffers are allocated for each of the columns, which causes excessive IO and resource usage.
  2. Fairly expensive decoding (~5-7% of the query CPU time).
  3. Read amplification caused by the fact that a partition can overlap parquet column chunk page boundaries.
  4. Despite the small size of the payload, fetching of the partitions is often responsible for tail latencies. The impact is even more pronounced on downsampled/aggregated data.

In the screenshot below you can see that a parquetTableRange.fetch call lasted for 3 seconds (with no good reason – probably it was blocked by async page reader that is shared with profile table reader):

image

I propose to develop a custom binary format and low level encoders and decoders for the data models. The data should be organised in row-major order. I expect that it will effectively remove symbolic data retrieval from the list of query latency factors.

cyriltovena commented 8 months ago

Definitively agree to aim at reducing IO for symbols but I think it's not just parquet it seems stracktraces.symdb is also causing tail latency.