The PR aims to optimize deduplication of profiles from uncompacted blocks.
Queriers stream profiles from replicas (ingesters) ordered by timestamp, and then by series labels, handling the streams in a k-way merge fashion (loser tree). Profiles with matching series labels and timestamps are deduplicated.
In large-scale deployments, this causes significant latencies due to the cost of series labels unmarshalling and comparison in queriers (stream deduplication is a single-thread operation).
The change exploits the fact that profiles are ordered by time, and a low number of profiles with matching timestamps is expected. Instead of streaming deduplication based on time and series order, we employ a map (set) that stores series fingerprints for the current timestamp. This allows us to send fingerprints instead of labels and eliminate expensive operations completely:
The PR aims to optimize deduplication of profiles from uncompacted blocks.
Queriers stream profiles from replicas (ingesters) ordered by timestamp, and then by series labels, handling the streams in a k-way merge fashion (loser tree). Profiles with matching series labels and timestamps are deduplicated.
In large-scale deployments, this causes significant latencies due to the cost of series labels unmarshalling and comparison in queriers (stream deduplication is a single-thread operation).
The change exploits the fact that profiles are ordered by time, and a low number of profiles with matching timestamps is expected. Instead of streaming deduplication based on time and series order, we employ a map (set) that stores series fingerprints for the current timestamp. This allows us to send fingerprints instead of labels and eliminate expensive operations completely:
The change is fully backward compatible, but makes it impossible to change the deduplication order as per https://github.com/grafana/pyroscope/issues/2192 (here)