thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
12.73k stars 2.04k forks source link

compactor: does not compact 4 consecutive 2-hour blocks #7287

Open vincent-olivert-riera opened 2 months ago

vincent-olivert-riera commented 2 months ago

Thanos, Prometheus and Golang version used:

Thanos: 0.32.4 Golang: go1.20.8

Prometheus: 2.45.0 goVersion: go1.20.5

Object Storage Provider:

Openstack S3 compatible

What happened:

I have a Thanos compactor with the following metrics:

thanos_compact_halted 0
thanos_compact_todo_compactions 0

It is tracking a bucket where almost all blocks have been compacted up to level-4. However, there are some level-1 blocks that are not compacted, and I was expecting them to be compacted into a level-2 block. I have made this animated gif to show it more clearly:

compactor

None of those blocks has been marked as no-compaction, so they should be compacted.

These are the meta.json for each one of them:

01HT1G02DF2W21A1KTHDVPX0BR ``` { "ulid": "01HT1G02DF2W21A1KTHDVPX0BR", "minTime": 1711584000246, "maxTime": 1711591200000, "stats": { "numSamples": 2492646, "numSeries": 5196, "numChunks": 20775 }, "compaction": { "level": 1, "sources": [ "01HT1G02DF2W21A1KTHDVPX0BR" ] }, "version": 1, "thanos": { "labels": { "cluster_name": "alpha", "cluster_node": "prometheus004", "datasource": "alpha-002" }, "downsample": { "resolution": 0 }, "source": "sidecar", "segment_files": [ "000001" ], "files": [ { "rel_path": "chunks/000001", "size_bytes": 3964613 }, { "rel_path": "index", "size_bytes": 646029 }, { "rel_path": "meta.json" } ], "index_stats": {} } } ```
01HT1PVSMCNYF8ZSDW53123NJX ``` { "ulid": "01HT1PVSMCNYF8ZSDW53123NJX", "minTime": 1711591200246, "maxTime": 1711598400000, "stats": { "numSamples": 2492640, "numSeries": 5193, "numChunks": 20772 }, "compaction": { "level": 1, "sources": [ "01HT1PVSMCNYF8ZSDW53123NJX" ] }, "version": 1, "thanos": { "labels": { "cluster_name": "alpha", "cluster_node": "prometheus004", "datasource": "alpha-002" }, "downsample": { "resolution": 0 }, "source": "sidecar", "segment_files": [ "000001" ], "files": [ { "rel_path": "chunks/000001", "size_bytes": 3957077 }, { "rel_path": "index", "size_bytes": 644900 }, { "rel_path": "meta.json" } ], "index_stats": {} } } ```
01HT1XQGXB5CHQB21YT5DNXFC8 ``` { "ulid": "01HT1XQGXB5CHQB21YT5DNXFC8", "minTime": 1711598400246, "maxTime": 1711605600000, "stats": { "numSamples": 2492640, "numSeries": 5193, "numChunks": 20772 }, "compaction": { "level": 1, "sources": [ "01HT1XQGXB5CHQB21YT5DNXFC8" ] }, "version": 1, "thanos": { "labels": { "cluster_name": "alpha", "cluster_node": "prometheus004", "datasource": "alpha-002" }, "downsample": { "resolution": 0 }, "source": "sidecar", "segment_files": [ "000001" ], "files": [ { "rel_path": "chunks/000001", "size_bytes": 3969637 }, { "rel_path": "index", "size_bytes": 645540 }, { "rel_path": "meta.json" } ], "index_stats": {} } } ```
01HT24K86QTXJ1HV2NW252DAEV ``` { "ulid": "01HT24K86QTXJ1HV2NW252DAEV", "minTime": 1711605600246, "maxTime": 1711612800000, "stats": { "numSamples": 2492646, "numSeries": 5196, "numChunks": 20775 }, "compaction": { "level": 1, "sources": [ "01HT24K86QTXJ1HV2NW252DAEV" ] }, "version": 1, "thanos": { "labels": { "cluster_name": "alpha", "cluster_node": "prometheus004", "datasource": "alpha-002" }, "downsample": { "resolution": 0 }, "source": "sidecar", "segment_files": [ "000001" ], "files": [ { "rel_path": "chunks/000001", "size_bytes": 3981026 }, { "rel_path": "index", "size_bytes": 645293 }, { "rel_path": "meta.json" } ], "index_stats": {} } } ```

This is the command line that I'm using:

/bin/thanos compact \
  --bucket-web-label=cluster_node \
  --data-dir /var/thanos/compact \
  --objstore.config-file=/etc/thanos/objstore.yml \
  --wait \
  --selector.relabel-config-file=/etc/thanos/relabel_config.yml \
  --downsampling.disable \
  --retention.resolution-5m=1d \
  --retention.resolution-1h=1d \
  --log.format=json \
  --log.level=debug
Contents of /etc/thanos/objstore.yml ``` type: S3 config: bucket: "thanos-alpha" endpoint: "redacted" access_key: "redacted" insecure: false signature_version2: false secret_key: "redacted" list_objects_version: "v1" http_config: idle_conn_timeout: 60s ```
Contents of /etc/thanos/relabel_config.yml ``` - action: keep regex: "alpha-002" source_labels: - datasource ```

What could be the reason for this behavior?

GiedriusS commented 2 months ago

Is thanos_compact_iterations_total more than 0? :thinking:

vincent-olivert-riera commented 2 months ago

Is thanos_compact_iterations_total more than 0? 🤔

Yes, it is constantly growing.

This is how thanos_compact_todo_compactions compares with thanos_compact_iterations_total:

image

image

douglascamata commented 2 months ago

@vincent-olivert-riera can you show us some information about the level 4 blocks you mentioned? What's their duration?

vincent-olivert-riera commented 2 months ago

@vincent-olivert-riera can you show us some information about the level 4 blocks you mentioned? What's their duration?

Sure.

image

This is its meta.json ``` { "ulid": "01HT1RN6JP9AZWYGHTG8XRXHSS", "minTime": 1710374400246, "maxTime": 1711584000000, "stats": { "numSamples": 418763800, "numSeries": 5231, "numChunks": 3489753 }, "compaction": { "level": 4, "sources": [ "01HRXEDZ6XVH2MYV7K0W6CQ27Z", "01HRXN9PEVG5DWN2DA3H1EQZYW", "01HRXW5DPSMG8GMH0CNQZ08S79", "01HRY314YTX75WHERFSB1XFMQ2", "01HRY9WW6SCNHJBT3BWB46V9Y1", "01HRYGRKETBY322F7ABNHFGMNP", "01HRYQMAPTCKA67H1XT72KMA4E", "01HRYYG1YSWVX80H09XBE68MW4", "01HRZ5BS6TX5RANK4KVFE1A25K", "01HRZC7GET4BX2PCHXV0AXF0MX", "01HRZK37PT5KR89H78K70Z3FW2", "01HRZSYYYSEPRXQ3J1G99EMRN5", "01HS00TP6TF8HZ11SSBJ5DJTG2", "01HS07PDEX23P49VE9KN0Z6B6P", "01HS0EJ4PSQQR9HYQNKV7HP2XN", "01HS0NDVZX0XGC0K85FKS87HEE", "01HS0W9K6T60BN2G1HFWNZDYZC", "01HS135AEVCERKJHJAE7YE82PG", "01HS1A11PS6BF1W0WM0ZD1X1Y4", "01HS1GWRYTJE5BQQDVYQ99DFMB", "01HS1QRG6TZAZFAGBQ5DYTFV0H", "01HS1YM7ESF2Q93M625Q9SEJYX", "01HS25FYPTETJ90722E08R9PFS", "01HS2CBNYT6CM6BJ9AK2M8PGN6", "01HS2K7D6T8AW1R2VMRCPQ2272", "01HS2T34ETMF9BH086GDZEEQ2X", "01HS30YVPTN97BAVJ5BPP3854Y", "01HS37TJYTHEAMQ4H9JT792RJD", "01HS3EPA6TERX4QJTX09QD0PJJ", "01HS3NJ1ETC27EQGAB9E36SF2N", "01HS3WDRPT2BNNW2R2E3D6BT8X", "01HS439FYTNNAYWCN7FM561T3Z", "01HS4A576T2G0HS1HFRR55Z83H", "01HS4H0YETBSTJCR1VFA2KT4HM", "01HS4QWNPSQNT3SVHWB65TWHK0", "01HS4YRCYVQFRZZ88FQQKRX6SV", "01HS55M46TW584YFK9NYT2Y8J0", "01HS5CFVEVSRBDV7MK2SY3QJE7", "01HS5KBJPTDC3DHZ3G5DP4Y1XZ", "01HS5T79YTS4438E2ZX4FS4T5F", "01HS61316SK2JXN87693FRJ3D9", "01HS67YRET9XW7TJNJ5A2QSM41", "01HS6ETFPT53QCM7VYJZTH8QB8", "01HS6NP6YT5FMTPNY8D5N7C9BF", "01HS6WHY6T8PCT1GNAC3TVKEY1", "01HS73DNET028TXMPQYVVF179Q", "01HS7A9CPT1HTA26Q4YC1FAGHV", "01HS7H53YT05D4QETBPC042C7C", "01HS7R0V6TKWRR46E82709XQGQ", "01HS7YWJETVH0E1YV7KWVV6BH8", "01HS85R9PT52VPMPFP3B9D30YQ", "01HS8CM0YSBNC8E3X5Z1S1QAKY", "01HS8KFR6TT0C995731BTSSZ5C", "01HS8TBFET873G0CX47NYV5P07", "01HS9176PTW3XFMSYGWQKZZC6E", "01HS982XYT1JV16HZWKEX5N696", "01HS9EYN6TSV8J00BRNE4CD74H", "01HS9NTCEV1WCDGHNS5PJSK0NP", "01HS9WP3PS3B9NFFP98JRYTHJ4", "01HSA3HTYTQCCX7DH8EPDBN4Q0", "01HSAADJ6VB2YFJKY4RWY18ZA2", "01HSAH99EVKJQZMBSH7PF497SG", "01HSAR50PT0ZH3ZNE8N1VJWTXQ", "01HSAZ0QYXE5KPBH5NS0WFYHEF", "01HSB5WF6SPAB3TJP64V7NSME1", "01HSBCR6EVHNNNCJBN27H8RWF2", "01HSBKKXPTYZ5D4SH8P4KW74C9", "01HSBTFMYW431XKWR750PXYYAQ", "01HSC1BC7088CV86NBKTXXQ494", "01HSC873ET7YV5PK4EV61GKGD9", "01HSCF2TPSNKYMSTCF07FBTYHQ", "01HSCNYHYT6SVCYBF58KTZJQ9J", "01HSCWT96TDBGKXVZ1VR44X9DV", "01HSD3P0ETXPZ80M8EEZ61RE8H", "01HSDAHQPYG3XCFY91N41FR4A7", "01HSDHDEYT44RNSS14WYNVB9VS", "01HSDR966V0NK7E5CN8ED8RQJK", "01HSDZ4XEVH5C45F9FZK47TN59", "01HSE60MPT0N3CER5QERB3QBH1", "01HSECWBYS84V009FYSB3N6B39", "01HSEKR36TJCYV52XBSWFRDFW6", "01HSETKTETGNGNQBZYS4MSA7EP", "01HSF1FHPTVY7PGBHS0MHR0V4Z", "01HSF8B8YT8PMPF2YZ7WYX6DXA", "01HSFF706TDE9TJ45HVEJE1C5E", "01HSFP2QETJHV0QEZ70QBVE2Z4", "01HSFWYEPTVXBBVW872WYRQ18S", "01HSG3T5YVSTQ8SMZBEDACHG01", "01HSGANX6TNA9HNM3ZH3ZGRGRS", "01HSGHHMET3487PYA2BRJP80YC", "01HSGRDBPTQWZ1ZS64GGH6SZY5", "01HSGZ92YT3SNZSRC0M6GH56JN", "01HSH64T6TG6N29P8E8WACF9C3", "01HSHD0HET2HC2HP9TWRRHFEYH", "01HSHKW8PSF5SPN131PA2CHCYN", "01HSHTQZYTDZJ016DDXQZ9ZXQ6", "01HSJ1KQ6WMAABPCAD4QCZ30BP", "01HSJ8FEESK80Z3N9Z5D19841W", "01HSJFB5PT40EA8WMKFBWCWZ3X", "01HSJP6WYTZE8GE46P726YJVXK", "01HSJX2M6VXD0SF4YYKA920WY8", "01HSK3YBESNDWHZM80MBY4E4S0", "01HSKAT2PYPJXVRZG1NBGX88B0", "01HSKHNSYVZA9ZB9MZAKS7G5YP", "01HSKRHH6THAAP44ZDB80NGEFE", "01HSKZD8ET7W92EFFRP7BDMQR0", "01HSM68ZPXA3P18Y0DPZQJXH8N", "01HSMD4PYSWWKE2V6DPWFQ5VWA", "01HSMM0E6TA2EM8J40F7FR478S", "01HSMTW5ESJ3E2X9K3F8CDQJR5", "01HSN1QWPSWRBCKV88HVAH6CXW", "01HSN8KKYTYBPX1ZQC9BZ4Y4HJ", "01HSNFFB6THMDJ7X4FGYBZK8BD", "01HSNPB2EY6CXHMPWJH46T3S43", "01HSNX6SPV19FPNV99XT4N3BGE", "01HSP42GYS3GZPDEEJXEVTE5H5", "01HSPAY86SE3D504YN5357EEK2", "01HSPHSZEVQ92QFGH66YRM0W9D", "01HSPRNPPXG4PJJGBQEZJ0TK2E", "01HSPZHDYVDTWSRMDQPJEHR7VA", "01HSQ6D56YBN25SBSWJ12H7XCW", "01HSQD8WET5WDSJE28PHV491NW", "01HSQM4KPTTEXXA5P0JZJ8MHKS", "01HSQV0AYTDG339RRNFRR7H7VV", "01HSR1W26TJE71X3TGF056TP2S", "01HSR8QSET41K89H6GW418HC6X", "01HSRFKGPTY39Y12QYYX9RDN86", "01HSRPF7YSE9VPF3RQTPHAW7TZ", "01HSRXAZ6VYK26MJPCA1CSYSJS", "01HSS46PES01ZR3HJS0NQ4XSH8", "01HSSB2DPSPMRS3RY72KK7CEM3", "01HSSHY4YVT7JREWXH5NNTN14P", "01HSSRSW6V89AZKRRF317ZV2RS", "01HSSZNKET6RPQ9NH02128GSPH", "01HST6HAPV68TBB7GRPY9WEXGS", "01HSTDD1YSNXRE53KBARRATVNF", "01HSTM8S6V698S7JJ3EGK49AFH", "01HSTV4GET9ZQW2866AX8FEQ8F", "01HSV207PW7390V9E9J9BBZJYC", "01HSV8VYYTTAZAAYD5M5V93NQX", "01HSVFQP6V4HCN4WF95QWGVAN7", "01HSVPKDETASJTW0BAAJB5VB9M", "01HSVXF4PWVT6Q68BN4B0KXA4A", "01HSW4AVYTN03K408NBNQ9B7QZ", "01HSWB6K6TFQJPSTEDR4Z94KNF", "01HSWJ2AET9Q65CZ0ZGPEC69YW", "01HSWRY1PVR3FH3GBJN4ANA8G6", "01HSWZSRYVX0HNXA527GK123SH", "01HSX6NG6WB00R3GRJKE5QSRA4", "01HSXDH7EVC955BNRS0KY1R130", "01HSXMCYPSNVZ4SMW2MQPSY2Z8", "01HSXV8NYTKHZ93211PK4WCK5H", "01HSY24D6TRMPDSHG32KNWKVH8", "01HSY904ETZ47RVJ3JK0KAFB37", "01HSYFVVPTN80CZDQ38HT26RQJ", "01HSYPQJYTP7E8E6SGWBHE0SPP", "01HSYXKA6TPPMRJW7D1W8WYYNZ", "01HSZ4F1ET71NACD56BAM6RNAP", "01HSZBARPT7FFE4X7CA2KKTYS9", "01HSZJ6FYT8JTW9KSMNG6YEGZ9", "01HSZS276TA2KNDMBDXA25RCG6", "01HSZZXYEWEGBJTTBWHHQGYYBA", "01HT06SNPTV1CKM0NHRS32PAQR", "01HT0DNCYS66962KGAWWSGZS9V", "01HT0MH46TP81CRNFCCCHRF19J", "01HT0VCVETBHMAJM9SVQSX0EM6", "01HT128JPTHQ3Q2YVF0RTB5ER5", "01HT1949YT230P1R17F1HSCYER" ], "parents": [ { "ulid": "01HS2VVDW0R0EVGNTND8E2BCTM", "minTime": 1710374400246, "maxTime": 1710547200000 }, { "ulid": "01HS80MPGZPTTEY3JBQ3RKV5F3", "minTime": 1710547200246, "maxTime": 1710720000000 }, { "ulid": "01HSD5E4H5NJH60BH30PKA3186", "minTime": 1710720000246, "maxTime": 1710892800000 }, { "ulid": "01HSJA7FHRDBYF0XFDEYXVMY0Z", "minTime": 1710892800246, "maxTime": 1711065600000 }, { "ulid": "01HSQF11TV1XEPV3ZN7WWX66HA", "minTime": 1711065600246, "maxTime": 1711238400000 }, { "ulid": "01HSWKTK77ZK2EM1H2EWGWCYNS", "minTime": 1711238400246, "maxTime": 1711411200000 }, { "ulid": "01HT1RKXJ3KFABZF5C1V8F7JJZ", "minTime": 1711411200246, "maxTime": 1711584000000 } ] }, "version": 1, "thanos": { "labels": { "cluster_name": "alpha", "cluster_node": "prometheus003-prom-jp2v-dev", "datasource": "alpha-002" }, "downsample": { "resolution": 0 }, "source": "compactor", "segment_files": [ "000001", "000002" ], "files": [ { "rel_path": "chunks/000001", "size_bytes": 536870124 }, { "rel_path": "chunks/000002", "size_bytes": 125027769 }, { "rel_path": "index", "size_bytes": 22741614 }, { "rel_path": "meta.json" } ], "index_stats": { "series_max_size": 4800, "chunk_max_size": 1013 } } } ```
douglascamata commented 2 months ago

@vincent-olivert-riera if you grep your Compactor's log with block IDs of the blocks that didn't get compacted, do you see anything that stands out? If possible, maybe increase the Compactor's log level to generate more logs (then revert it, otherwise logs might be too spammy). 🤔

vincent-olivert-riera commented 2 months ago

@douglascamata , I haven't increased the Compactor's log level yet, but this is what the Compactor is doing (in a loop):

Apr 19, 2024 @ 20:29:37.120{"caller":"compact.go:1478","level":"info","msg":"compaction iterations done","ts":"2024-04-19T11:29:29.342094884Z"}
Apr 19, 2024 @ 20:29:37.120{"caller":"compact.go:457","level":"info","msg":"downsampling was explicitly disabled","ts":"2024-04-19T11:29:29.342370667Z"}
Apr 19, 2024 @ 20:27:42.421{"cached":346,"caller":"fetcher.go:487","component":"block.BaseFetcher","duration":"8.78195813s","duration_ms":8781,"level":"info","msg":"successfully synchronized block metadata","partial":0,"returned":346,"ts":"2024-04-19T11:26:37.988636543Z"}
Apr 19, 2024 @ 20:27:42.421{"caller":"fetcher.go:317","component":"block.BaseFetcher","concurrency":32,"level":"debug","msg":"fetching meta data","ts":"2024-04-19T11:27:29.206720687Z"}
Apr 19, 2024 @ 20:27:42.421{"cached":346,"caller":"fetcher.go:487","component":"block.BaseFetcher","duration":"7.076132358s","duration_ms":7076,"level":"info","msg":"successfully synchronized block metadata","partial":0,"returned":346,"ts":"2024-04-19T11:27:36.282764217Z"}
Apr 19, 2024 @ 20:26:36.158{"caller":"fetcher.go:317","component":"block.BaseFetcher","concurrency":32,"level":"debug","msg":"fetching meta data","ts":"2024-04-19T11:25:29.206734945Z"}
Apr 19, 2024 @ 20:26:36.158{"cached":346,"caller":"fetcher.go:487","component":"block.BaseFetcher","duration":"7.2963137s","duration_ms":7296,"level":"info","msg":"successfully synchronized block metadata","partial":0,"returned":346,"ts":"2024-04-19T11:25:36.502939791Z"}
Apr 19, 2024 @ 20:26:36.158{"caller":"fetcher.go:317","component":"block.BaseFetcher","concurrency":32,"level":"debug","msg":"fetching meta data","ts":"2024-04-19T11:26:29.20683454Z"}
Apr 19, 2024 @ 20:25:37.421{"caller":"compact.go:1414","level":"info","msg":"start sync of metas","ts":"2024-04-19T11:24:22.419242154Z"}
Apr 19, 2024 @ 20:25:37.421{"caller":"fetcher.go:317","component":"block.BaseFetcher","concurrency":32,"level":"debug","msg":"fetching meta data","ts":"2024-04-19T11:24:22.419842845Z"}
Apr 19, 2024 @ 20:25:37.421{"cached":346,"caller":"fetcher.go:487","component":"block.BaseFetcher","duration":"5.786435988s","duration_ms":5786,"level":"info","msg":"successfully synchronized block metadata","partial":0,"returned":174,"ts":"2024-04-19T11:24:28.206118667Z"}
Apr 19, 2024 @ 20:25:37.421{"caller":"compact.go:1419","level":"info","msg":"start of GC","ts":"2024-04-19T11:24:28.20786563Z"}
Apr 19, 2024 @ 20:25:37.421{"caller":"compact.go:1442","level":"info","msg":"start of compactions","ts":"2024-04-19T11:24:28.208735693Z"}

I have search for all the block IDs, but Kibana does not return anything at all. ~I will try to increase the log level and see what happens.~ The log level is debug.