thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.09k stars 2.1k forks source link

`thanos tools bucket rewrite` fails with a segmentation violation on v0.36.0 and later of thanos. #7844

Open chris-barbour-as opened 2 weeks ago

chris-barbour-as commented 2 weeks ago

Thanos, Prometheus and Golang version used:

Thanos@0.36.1 Go@1.21

Also tested with: thanos@main (v0.35.2-0.20241017120053-731e4607d34a according to go.mod) go@1.23.2

Confirmed issue does not affect: thanos@v0.35.1

Object Storage Provider:

FILESYSTEM

What happened:

Starting with v0.36.0 thanos tools bucket rewrite fails with a Segmentation Violation while attempting to write the new block back to the store.

What you expected to happen:

thanos tools bucket rewrite should not fail with a Segmentation Violation

How to reproduce it (as minimally and precisely as possible):

Attached go code will reproduce error.

My specific command invocation is as follows:

thanos tools bucket rewrite --objstore.config-file=objstore-local.yaml --rewrite.to-relabel-config-file=relabel-config.yaml --no-dry-run --delete-blocks --id "01HCGZPEM6EEHJ75218N914HNM" --tmp.dir="/home/appuser/data/thanos-rewrite"

objstore-local.yaml:

type: FILESYSTEM
config:
  directory: "/home/appuser/data"
prefix: ""

relabel-config.yaml:

- action: replace
  target_label: "foo"
  replacement: "bar"

However, the attached code will reproduce the issue by simply running:

go get

go run main.go

Full logs to relevant components:

Logs

``` Start time: Sat Oct 19 01:43:15 UTC 2024 ts=2024-10-19T01:43:15.39052425Z caller=factory.go:53 level=info msg="loading bucket configuration" ts=2024-10-19T01:43:15.406953575Z caller=fetcher.go:623 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=16.053695ms duration_ms=16 cached=1077 returned=1077 partial=0 ts=2024-10-19T01:43:15.414714245Z caller=main.go:174 level=info msg=exiting ts=2024-10-19T01:43:15.44098453Z caller=factory.go:53 level=info msg="loading bucket configuration" ts=2024-10-19T01:43:17.14659156Z caller=tools_bucket.go:1226 level=info msg="downloading block" source=01HCGZPEM6EEHJ75218N914HNM ts=2024-10-19T01:46:02.077523638Z caller=tools_bucket.go:1263 level=info msg="changelog will be available" file=/home/appuser/data/thanos-rewrite/01JAH77XYXDS8NBHFBJFKCS3V1/change.log ts=2024-10-19T01:46:02.134607708Z caller=tools_bucket.go:1278 level=info msg="starting rewrite for block" source=01HCGZPEM6EEHJ75218N914HNM new=01JAH77XYXDS8NBHFBJFKCS3V1 toDelete= toRelabel="- action: replace\n target_label: \"foo\"\n replacement: \"bar\"\n- action: replace\n" ts=2024-10-19T01:49:15.763983386Z caller=compactor.go:42 level=info msg="processed 10.00% of 3419617 series" ts=2024-10-19T01:49:47.464508668Z caller=compactor.go:42 level=info msg="processed 20.00% of 3419617 series" ts=2024-10-19T01:50:48.513697195Z caller=compactor.go:42 level=info msg="processed 30.00% of 3419617 series" ts=2024-10-19T01:51:20.873113385Z caller=compactor.go:42 level=info msg="processed 40.00% of 3419617 series" ts=2024-10-19T01:51:25.882289646Z caller=compactor.go:42 level=info msg="processed 50.00% of 3419617 series" ts=2024-10-19T01:51:37.924141467Z caller=compactor.go:42 level=info msg="processed 60.00% of 3419617 series" ts=2024-10-19T01:51:43.005992941Z caller=compactor.go:42 level=info msg="processed 70.00% of 3419617 series" ts=2024-10-19T01:55:21.447415738Z caller=compactor.go:42 level=info msg="processed 80.00% of 3419617 series" ts=2024-10-19T01:56:18.949121034Z caller=compactor.go:42 level=info msg="processed 90.00% of 3419617 series" ts=2024-10-19T01:57:32.258928562Z caller=compactor.go:42 level=info msg="processed 100.00% of 3419617 series" ts=2024-10-19T01:57:32.26977951Z caller=tools_bucket.go:1288 level=info msg="wrote new block after modifications; flushing" source=01HCGZPEM6EEHJ75218N914HNM new=01JAH77XYXDS8NBHFBJFKCS3V1 ts=2024-10-19T01:58:03.092458808Z caller=tools_bucket.go:1297 level=info msg="uploading new block" source=01HCGZPEM6EEHJ75218N914HNM new=01JAH77XYXDS8NBHFBJFKCS3V1 panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x92573b] goroutine 102 [running]: github.com/grafana/regexp.(*Regexp).UnmarshalText(0x0, {0xc129d88463?, 0xc0f46c6f08?, 0x2907820?}) /go/pkg/mod/github.com/grafana/regexp@v0.0.0-20240518133315-a468a5bfb3bc/regexp.go:1302 +0x3b encoding/json.(*decodeState).literalStore(0xc0005b4b68, {0xc129d88462, 0xc, 0x199e}, {0x2907820?, 0xc0f46c6f08?, 0x27ddc00?}, 0x0) /usr/local/go/src/encoding/json/decode.go:877 +0x5f3 encoding/json.(*decodeState).value(0xc0005b4b68, {0x2907820?, 0xc0f46c6f08?, 0x5?}) /usr/local/go/src/encoding/json/decode.go:388 +0x115 encoding/json.(*decodeState).object(0xc0005b4b68, {0x24d0120?, 0xc0001de298?, 0x23212c0?}) /usr/local/go/src/encoding/json/decode.go:755 +0xd08 encoding/json.(*decodeState).value(0xc0005b4b68, {0x24d0120?, 0xc0001de298?, 0x1?}) /usr/local/go/src/encoding/json/decode.go:374 +0x3e encoding/json.(*decodeState).array(0xc0005b4b68, {0x23212c0?, 0xc0f3ef2710?, 0x313c?}) /usr/local/go/src/encoding/json/decode.go:555 +0x50f encoding/json.(*decodeState).value(0xc0005b4b68, {0x23212c0?, 0xc0f3ef2710?, 0x10?}) /usr/local/go/src/encoding/json/decode.go:364 +0x74 encoding/json.(*decodeState).object(0xc0005b4b68, {0x2629080?, 0xc0f3ef26e0?, 0x2325400?}) /usr/local/go/src/encoding/json/decode.go:755 +0xd08 encoding/json.(*decodeState).value(0xc0005b4b68, {0x2629080?, 0xc0f3ef26e0?, 0x1?}) /usr/local/go/src/encoding/json/decode.go:374 +0x3e encoding/json.(*decodeState).array(0xc0005b4b68, {0x2325400?, 0xc0005b4b00?, 0x38c3?}) /usr/local/go/src/encoding/json/decode.go:555 +0x50f encoding/json.(*decodeState).value(0xc0005b4b68, {0x2325400?, 0xc0005b4b00?, 0x8?}) /usr/local/go/src/encoding/json/decode.go:364 +0x74 encoding/json.(*decodeState).object(0xc0005b4b68, {0x282b140?, 0xc0005b4aa8?, 0x39c0?}) /usr/local/go/src/encoding/json/decode.go:755 +0xd08 encoding/json.(*decodeState).value(0xc0005b4b68, {0x282b140?, 0xc0005b4aa8?, 0x6?}) /usr/local/go/src/encoding/json/decode.go:374 +0x3e encoding/json.(*decodeState).object(0xc0005b4b68, {0x252fa40?, 0xc0005b4a00?, 0xc0005f8ee8?}) /usr/local/go/src/encoding/json/decode.go:755 +0xd08 encoding/json.(*decodeState).value(0xc0005b4b68, {0x252fa40?, 0xc0005b4a00?, 0xc0005f8f38?}) /usr/local/go/src/encoding/json/decode.go:374 +0x3e encoding/json.(*decodeState).unmarshal(0xc0005b4b68, {0x252fa40?, 0xc0005b4a00?}) /usr/local/go/src/encoding/json/decode.go:181 +0x133 encoding/json.(*Decoder).Decode(0xc0005b4b40, {0x252fa40, 0xc0005b4a00}) /usr/local/go/src/encoding/json/stream.go:73 +0x179 github.com/thanos-io/thanos/pkg/block/metadata.Read({0x38c90e8?, 0xc0001de288}) /app/pkg/block/metadata/meta.go:260 +0x10c github.com/thanos-io/thanos/pkg/block/metadata.ReadFromDir({0xc11dc14680, 0x3c}) /app/pkg/block/metadata/meta.go:252 +0x8e github.com/thanos-io/thanos/pkg/block.upload({0x38db3b0, 0xc0000ce550}, {0x38b9f40, 0xc0001c4440}, {0x38f2518?, 0xc000622820?}, {0xc11dc14680, 0x3c}, {0x0, 0x0}, ...) /app/pkg/block/block.go:126 +0x125 github.com/thanos-io/thanos/pkg/block.Upload(...) /app/pkg/block/block.go:98 main.registerBucketRewrite.func1.1() /app/cmd/thanos/tools_bucket.go:1303 +0x1cf6 github.com/oklog/run.(*Group).Run.func1({0xc000a5eb60?, 0xc000622ad0?}) /go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38 +0x29 created by github.com/oklog/run.(*Group).Run in goroutine 1 /go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:37 +0x67 ```

Anything else we need to know:

The issue is specifically related to the helpful information that thanos tools bucket rewrite adds to the updated block's meta.json file.

The following JSON can be used to reproduce the problem:

{
  "thanos": {
    "rewrites": [
      {
        "relabels_applied": [
          {
            "SourceLabels": null,
            "Separator": ";",
            "Regex": "foo",
            "Modulus": 0,
            "TargetLabel": "bar",
            "Replacement": "baz",
            "Action": "replace"
          }
        ]
      }
    ]
  }
}

Removing "Regex": "foo", will make the problem go away. See attached code.

thanos-rewrites-segv.tar.gz

dosubot[bot] commented 2 weeks ago

Hey, @chris-barbour-as! I'm here to help you with any bugs, questions, or contributions you have. Let's work together to solve this issue.

The segmentation violation you're encountering with the thanos tools bucket rewrite command in Thanos v0.36.0 and later seems to be related to the handling of the Regex field in the meta.json file. The error occurs when the Regex field is present, and removing it resolves the issue. This suggests that there might be a problem with how the Regex field is being processed or unmarshaled, possibly due to a nil pointer dereference in the github.com/grafana/regexp package.

To address this issue, you might consider the following steps:

  1. Check for Updates: Ensure that you are using the latest version of Thanos, as the issue might have been fixed in a newer release.

  2. Review Code Changes: Look into the changes made in the Thanos codebase around the handling of the Regex field or the github.com/grafana/regexp package to identify any potential bugs introduced in v0.36.0.

  3. Modify meta.json: As a temporary workaround, you can try modifying the meta.json to exclude the Regex field if it's not critical for your use case.

  4. Debugging: If you have the capability, you could debug the code to pinpoint the exact location and cause of the nil pointer dereference.

If the issue persists, it might be beneficial to report it to the Thanos maintainers with the details you've provided, including the specific JSON structure that triggers the error.

To continue talking to Dosu, mention @dosu.

chris-barbour-as commented 2 weeks ago

Possible this is related to: https://github.com/thanos-io/thanos/pull/7338