grafana / pyroscope

Continuous Profiling Platform. Debug performance issues down to a single line of code
https://grafana.com/oss/pyroscope/
GNU Affero General Public License v3.0
9.64k stars 574 forks source link

perf: optimize deduplication #3351

Closed kolesnikovae closed 3 weeks ago

kolesnikovae commented 3 weeks ago

The PR aims to optimize deduplication of profiles from uncompacted blocks.

Queriers stream profiles from replicas (ingesters) ordered by timestamp, and then by series labels, handling the streams in a k-way merge fashion (loser tree). Profiles with matching series labels and timestamps are deduplicated.

In large-scale deployments, this causes significant latencies due to the cost of series labels unmarshalling and comparison in queriers (stream deduplication is a single-thread operation).

image

The change exploits the fact that profiles are ordered by time, and a low number of profiles with matching timestamps is expected. Instead of streaming deduplication based on time and series order, we employ a map (set) that stores series fingerprints for the current timestamp. This allows us to send fingerprints instead of labels and eliminate expensive operations completely:

image

The change is fully backward compatible, but makes it impossible to change the deduplication order as per https://github.com/grafana/pyroscope/issues/2192 (here)