perf: tune async batch iterator

grafana / pyroscope

Continuous Profiling Platform. Debug performance issues down to a single line of code

GNU Affero General Public License v3.0

9.97k stars 597 forks source link

It's been observed that the async batch iterator we're using for fetching Parquet rows might be using too much memory. Note the query.CloneParquetValues call under iter.(*AsyncBatchIterator[...]).fillBuffer in the flame graph:

The problem manifests when the query hits downsampled (aggregated) profiles: a row may contain thousands of values. Another factor is the misalignment of the query split interval and the block duration: each sub-range is processed independently, with its own iterator, thus multiplying the memory requirement.

In practice, a large buffer is not required here, as it is only needed to avoid waiting for fetches from individual columns by reading the data from them ahead of time. In turn, each of the columns has it's own "read ahead" buffer, which should minimaize blocking of the top-level iterator.

One way to solve the problem is to make the iterator work with size in bytes and have a predictable memory footprint. In the PR, I reduce the default buffer size and change the allocation strategy to use the new slices.Grow function.

grafana / pyroscope

perf: tune async batch iterator #3358