thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.09k stars 2.1k forks source link

Downsampling is not working #6572

Open beramod opened 1 year ago

beramod commented 1 year ago

Thanos, Prometheus and Golang version used: 0.26.0

Object Storage Provider: S3

What happened: Downsampling is not working

What you expected to happen: Downsampling is working

How to reproduce it (as minimally and precisely as possible): Run Compactor

Full logs to relevant components: The logs related to downsampling do not exist at all. There is no log of downsampling disabled, enabled, or failed. Of course, we didn't give the downsample.disable option.

    compactMainFn := func() error {
        if err := compactor.Compact(ctx); err != nil {
            return errors.Wrap(err, "compaction")
        }

        if !conf.disableDownsampling {
            // After all compactions are done, work down the downsampling backlog.
            // We run two passes of this to ensure that the 1h downsampling is generated
            // for 5m downsamplings created in the first run.
            level.Info(logger).Log("msg", "start first pass of downsampling")
            ...
        } else {
            level.Info(logger).Log("msg", "downsampling was explicitly disabled")
        }

According to the code below, there should be a log called "start first pass of downsampling" or "downsampling was explicitly disabled", but there is no log. Log related to compaction, fetcher, and clean block exists.

If you run Downsampling using the bucket tool, a block is generated normally.

I've set up the concurrency and scale-up. What more do I need to check?

I ask for your help.

Thank you

Environment:

yeya24 commented 1 year ago

I guess it is the same issue mentioned in https://thanos.io/tip/operating/compactor-backlog.md/. Please follow the doc and see if you have the same compactor backlog issue

beramod commented 1 year ago

@yeya24 Thank you for your reply. I have done all the measures written on the link. I checked two metrics: thanos_compact_todo_compaction_blocks and thanos_compact_todo_downsample_blocks. Looking at the screenshot below, it is judged that Compaction and Downsampling operate normally.

image

The problem here is that there are no logs for the thanos_compact_downsample_total metric and downsampling at all.

yeya24 commented 1 year ago

@beramod Please make sure you have at least 40 hour range blocks to be able to downsample to 5m blocks, 10 day range blocks to be able to downsample to 1h blocks. If there is no logs and the metric increament, then I think your blocks are not qualified to downsample

beramod commented 1 year ago

@yeya24 I checked with thanos-bucket and it is going downsampling. This is because downsampled blocks are being created. However, the problem is that downsampling is possible, but logs and related metrics are not visible.

yeya24 commented 1 year ago

@beramod I think we have logs when downsampled block is created. Does it work for you? If not, feel free to add any logs and metrics you think might be useful.

CryptoTr4der commented 4 months ago

Does these metrics really exist? See also https://github.com/thanos-io/thanos/issues/3658

yeya24 commented 4 months ago

I think at least thanos_compact_downsample_total is exposed by default with a value 0. It increments every time a downsample got processed

CryptoTr4der commented 4 months ago

The thanos compactor only expose go metrics andthanos_compact_total_* metrics on the endpoint /metrics.

yeya24 commented 4 months ago

@CryptoTr4der You need to at least have all compactions finished first. Please check https://thanos.io/tip/operating/compactor-backlog.md/#troubleshoot-compactor-backlog

CryptoTr4der commented 4 months ago

@yeya24 Thanks! I will check this today.