thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.05k stars 2.09k forks source link

Add Native Histogram Support #5907

Open rabenhorst opened 1 year ago

rabenhorst commented 1 year ago

Is your proposal related to a problem?

Experimental native histogram support was released with Prometheus v0.40.x. https://github.com/thanos-io/thanos/pull/5896 updated Thanos to Prometheus v0.40.1 but without implementing native histograms.

I propose the implementation of native histogram support (behind a feature flag) as a next step in follow up PR(s).

Describe the solution you'd like

We need to go through all Thanos components and check what needs to be changed for native histogram support. So far we identified the following changes are required (list will be updated on new findings):

For each component we need to make sure to have tests in place after implementation and e2e test for functionality provided by multiple components (e.g. ingestion, querying, etc.).

yeya24 commented 1 year ago

Just double checking, I see we are adding native histogram support to query, receiver and sidecar in https://github.com/thanos-io/thanos/pull/6032.

For components like compactor, store gateway, do we still need implement anything to make sure they work with native histograms? If no code changes required, at least we should create some tests to make sure they work properly.

Updated: NVM I think we need additional support for those components. Most of our code are assuming XOR encoding or Aggr encoding only.

yeya24 commented 1 year ago

IIUC, only query frontend, Compactor and Ruler needs to support Native histograms?

I am wondering if we need to do anything specific in Ruler or it works out of the box by updating the Prometheus dependency. I guess the latter.

rabenhorst commented 1 year ago

IIUC, only query frontend, Compactor and Ruler needs to support Native histograms?

I am wondering if we need to do anything specific in Ruler or it works out of the box by updating the Prometheus dependency. I guess the latter.

I also think it's the latter, but we will double check and add an e2e test for rulers.

rabenhorst commented 1 year ago

Downsampling and compaction is blocked by a bug we hit in Prometheus regarding appending histograms to open chunks during compaction, which should be fixed by https://github.com/prometheus/prometheus/pull/12185. The bug needs to be fixed before we can open a PR for compaction/downsampling of native histograms.

rabenhorst commented 1 year ago

Ruler does not work with native histograms out of the box. promclient returns model representation of native histogram for queries, which we need to translate to histogram.FloatHistograms: https://github.com/thanos-io/thanos/blob/5d5d39a35ba62889aa759d60380fe43deee386e4/pkg/promclient/promclient.go#L483-L486

We will look into using the GRPC query API for the ruler so we directly get histogram.FloatHistograms in the result.

rabenhorst commented 1 year ago

We have PRs now for all components. I just added the last one for rule.