thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.12k stars 2.1k forks source link

storegateway,compactor,bucketweb: SSE-C encryption causes Bad Request errors on reading data from bucket #7784

Open jabbrwcky opened 1 month ago

jabbrwcky commented 1 month ago

Thanos, Prometheus and Golang version used:

thanos:0.36.1

Object Storage Provider: S3

What happened:

I configured Thanos to store metrics data in OpenStack Ceph via S3 protocol using SSE-C encryption. The sidecar sucessfully writes data to the bucket, but all components reading from the buckets fail to stat the objects in the bucket, returning a 400 Bad Request error.

What you expected to happen:

Data written to the bucket using SSE-C encryption should have been read back without error.

How to reproduce it (as minimally and precisely as possible):

Configure objectstorage with S3 setting encryption to SSE-C:

type: S3
config:
  bucket: metrics
  endpoint: objectstorage.fes.cloud.syseleven.net
  region: fes
  access_key: <key>
  secret_key: <secret>
  sse_config:
    type: SSE-C
    encryption_key: /sse-c-key

The sidecar should be able to store data. All reading components fail.

Anything else we need to know:

I identified thanos-io/objstore as the culprit, because the mino..StatObject() call also requires (at least for SSE-C!) an encryption key to be passed in the request options, which is only provided in Upload and Get resp. GetRange calls there, but not for Exists and Attributes.

As a consequence these calls fail with an 400 Bad Request error.

This may not be the case for SSE-KMS and default SSE methods, because the server handles these transparently.

I opened an issue and an PR over at thanos-io/objstore so this issue is mostly a marker for the release fixing it by using the fixed version of objstore: https://github.com/thanos-io/objstore/issues/141

dosubot[bot] commented 1 month ago

Hey @jabbrwcky! I'm here to help you squash those bugs, answer your burning questions, and guide you on your journey to becoming a top-notch contributor. Let's tackle this together! 🤖

I found a similar unsolved discussion that might be relevant to your issue: Thanos store and compactor fail to load and sync blocks from Linode Object Storage uploaded via sidecar using sse-c encryption [1].

To resolve the 400 Bad Request error when Thanos components read from an OpenStack Ceph bucket using SSE-C encryption with the S3 protocol, ensure that the sse_config section in your Thanos configuration is correctly set up for SSE-C. Specifically, you need to provide the path to the encryption key using the encryption_key field.

Here is an example configuration snippet for SSE-C:

type: S3
config:
  bucket: "<your-bucket-name>"
  endpoint: "<your-endpoint>"
  access_key: "<your-access-key>"
  secret_key: "<your-secret-key>"
  sse_config:
    type: "SSE-C"
    encryption_key: "<path-to-your-encryption-key>"

Make sure that the encryption_key points to the correct path where your encryption key is stored. This should resolve the 400 Bad Request error related to SSE-C encryption [2].

To continue talking to Dosu, mention @dosu.

jabbrwcky commented 1 month ago

@dosu The symptoms found by dosubot are similar, but the encryption key is configured correctly in my case (I can access the files using a test script using minio SDK directly).

I have linked the issue and PR in thanos-io/bjstore that should fix this issue above.

dosubot[bot] commented 1 month ago

To resolve the 400 Bad Request error when Thanos components read from an OpenStack Ceph bucket using SSE-C encryption, ensure that the encryption key is passed correctly in the Exists() and Attributes() methods. This can be done by configuring the sse_config section in your Thanos configuration file to include the encryption_key for SSE-C.

Here is an example configuration for an S3 bucket with SSE-C encryption:

type: S3
config:
  bucket: "<your-bucket-name>"
  endpoint: "<your-endpoint>"
  access_key: "<your-access-key>"
  secret_key: "<your-secret-key>"
  sse_config:
    type: "SSE-C"
    encryption_key: "<path-to-your-encryption-key>"

Make sure that the encryption_key is correctly specified and accessible by Thanos components. This should ensure that the encryption key is passed during the Exists() and Attributes() method calls, resolving the 400 Bad Request error [1].

To continue talking to Dosu, mention @dosu.

toporek3112 commented 1 month ago

Name and Version

bitnami/thanos 12.11.2

What architecture are you using?

amd64

What steps will reproduce the bug?

  1. Create a secret for the sse-c key

  2. deploy the thanos chelm chart with a configured objstoreConfig e.g:

    objstoreConfig:
    type: "s3"
    config:
    bucket: "<bucket>"
    endpoint: "<endpoint>"
    access_key: "<access_key>"
    secret_key: "<secret_key>"
    sse_config:
      type: "SSE-C"
      encryption_key: "/etc/secrets/s3-encryption.key"
    insecure: false
    trace:
      enable: true

    (don't forget to mount the secret with the sse-c key)

  3. deploy bitnami prometheus with thanos-sidecar taking the objectstore config from the previously created secret thanos-objstore-secret created by the bitnami/thanos helm chart (also don't forget to mount the secret with the sse-c key and the thanos-objstore-secret secret)

  4. With the configuration above you should see that the Thanos sidecar is successfully writing blocks to s3 but when executing thanos tools bucket inspect --objstore.config-file=/conf/objstore.yml an error is thrown because it's not using the sse-c key on read. The Thanos pods like storegateway, compactor etc. als get the same error

Are you using any custom parameters or values?

Custom values.yaml with extra mounts and a custom objectstorage config mentioned above

What is the expected behavior?

Expected behaviour is the thanos sidecar (in kube-prometheus pod) writing blocks to s3 and successfully reading it again.

What do you see instead?

I see the thanos sidecar successfully uploading a block:

{
  "caller": "stdlib.go:105",
  "level": "debug",
  "s3TraceMsg": "---------START-HTTP---------",
  "ts": "2024-09-26T14:53:05.340435162Z"
}
{
  "caller": "stdlib.go:105",
  "level": "debug",
  "s3TraceMsg": "PUT /thanos/01J8QAV4YBQFVVXTYZQAQ48M7P/chunks/000001 HTTP/1.1\r
        Host: <host>\r
        User-Agent: MinIO (linux; amd64) minio-go/v7.0.45 thanos-sidecar/0.31.0 (go1.19.11)\r
        Content-Length: 5110574\r
        Authorization: AWS4-HMAC-SHA256 Credential=<path>/s3/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date;x-amz-server-side-encryption-customer-algorithm;x-amz-server-side-encryption-customer-key;x-amz-server-side-encryption-customer-key-md5, Signature=**REDACTED**\r
        Content-Type: application/octet-stream\r
        X-Amz-Content-Sha256: UNSIGNED-PAYLOAD\r
        X-Amz-Date: 20240926T145304Z\r
        X-Amz-Server-Side-Encryption-Customer-Algorithm: AES256\r
        X-Amz-Server-Side-Encryption-Customer-Key: <sse-c-key>\r
        X-Amz-Server-Side-Encryption-Customer-Key-Md5: <md5-key>\r
        Accept-Encoding: gzip\r
        \r",
  "ts": "2024-09-26T14:53:05.347550595Z"
}
{
  "caller": "stdlib.go:105",
  "level": "debug",
  "s3TraceMsg": "HTTP/1.1 200 OK\r
        Content-Length: 0\r
        Date: Thu, 26 Sep 2024 14:53:04 GMT\r
        Etag: \"sfasfdasdf\"\r
        Host: <host>\r
        Server: Aleph/0.6.0\r
        X-Amz-Bucket-Region: <region>\r
        X-Amz-Id-2: <id>\r
        X-Amz-Request-Id: <id>\r
        X-Amz-Server-Side-Encryption-Customer-Algorithm: AES256\r
        X-Amz-Server-Side-Encryption-Customer-Key-Md5: <md5-key>\r
        X-Amzn-Request-Id: <id>\r",
  "ts": "2024-09-26T14:53:05.347630293Z"
}
{
  "caller": "stdlib.go:105",
  "level": "debug",
  "s3TraceMsg": "---------END-HTTP---------",
  "ts": "2024-09-26T14:53:05.347647957Z"
}
{
  "bucket": "tracing: thanos",
  "caller": "objstore.go:288",
  "dst": "01J8QAV4YBQFVVXTYZQAQ48M7P/chunks/000001",
  "from": "/prometheus/thanos/upload/01J8QAV4YBQFVVXTYZQAQ48M7P/chunks/000001",
  "level": "debug",
  "msg": "uploaded file",
  "ts": "2024-09-26T14:53:05.347692479Z"
}

But the bucketweb pod throws following error:

{
"caller": "stdlib.go:105",
"level": "debug",
"s3TraceMsg": "---------START-HTTP---------",
"ts": "2024-09-26T10:08:42.609910283Z"
}
{
"caller": "stdlib.go:105",
"level": "debug",
"s3TraceMsg": "HEAD /thanos/01J8QAV4YBQFVVXTYZQAQ48M7P/meta.json HTTP/1.1\r
                Host: <host>\r
                User-Agent: MinIO (linux; amd64) minio-go/v7.0.45 thanos-bucket/0.31.0 (go1.19.11)\r
                Authorization: AWS4-HMAC-SHA256 Credential=<preifx>/20240926/<region>/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=**REDACTED**\r
                X-Amz-Content-Sha256: <sha256>\r
                X-Amz-Date: 20240926T100842Z\r
                \r",
"ts": "2024-09-26T10:08:42.60999955Z"
}
{
"caller": "stdlib.go:105",
"level": "debug",
"s3TraceMsg": "HTTP/1.1 400 Bad Request\r
                Content-Length: 111\r
                Content-Type: application/xml\r
                Date: Thu, 26 Sep 2024 10:08:42 GMT\r
                Host: <host>\r
                Server: Aleph/0.6.0\r
                X-Amz-Bucket-Region: <region>\r
                X-Amz-Id-2: <id>\r
                X-Amz-Request-Id: <id>\r
                X-Amzn-Request-Id: <id>\r",
"ts": "2024-09-26T10:08:42.61003876Z"
}
{
"caller": "stdlib.go:105",
"level": "debug",
"s3TraceMsg": "---------END-HTTP---------",
"ts": "2024-09-26T10:08:42.61006195Z"
}

From what I can see in the bucketweb logs is that these headers are missing so thanos is not using the sse-c key:

        X-Amz-Server-Side-Encryption-Customer-Algorithm: AES256\r
        X-Amz-Server-Side-Encryption-Customer-Key: <sse-c-key>\r
        X-Amz-Server-Side-Encryption-Customer-Key-Md5: <md5-key>\r

Also when execting thanos tools bucket inspect --objstore.config-file=/conf/objstore.yml in the thanos sidecar or any other thanos container I see:

level=info ts=2024-09-26T15:00:36.007894998Z caller=factory.go:52 msg="loading bucket configuration"
level=error ts=2024-09-26T15:00:36.18380763Z caller=main.go:135 err="35 errors: meta.json file exists: 01J8QAV4YBQFVVXTYZQAQ48M7P/meta.json: stat s3 object: 400 Bad Request; 
...

I also have been trying to read the meta.json file via cmd with minio client and it works when I use the flag --enc-c so the fault seems to be on thanos side not using the sse-c key when reading from s3...

Additional information

As a workaround I tried configuring thanos objectstore config to set the headers mentioned above like this:

objstoreConfig:
  type: "s3"
  config:
    bucket: "<bucket>"
    endpoint: "<endpoint>"
    access_key: "<access_key>"
    secret_key: "<secret_key>"
    put_user_metadata:
      X-Amz-Server-Side-Encryption-Customer-Algorithm: "AES256"
      X-Amz-Server-Side-Encryption-Customer-Key: "<sse-c-key>"
      X-Amz-Server-Side-Encryption-Customer-Key-Md5: "<md5-key>"
    sse_config:
      type: "SSE-C"
      encryption_key: "/etc/secrets/s3-encryption.key"
    insecure: false
    trace:
      enable: true

I tried it with only the X-Amz-Server-Side-Encryption-Customer-Key header, with two of these headers and with all three of them but I get this error from the thanos sidecar:

{
  "caller": "sidecar.go:347",
  "err": "upload 01J8Q9QS886NV5DSPZSA9006MH: upload chunks: upload file /prometheus/thanos/upload/01J8Q9QS886NV5DSPZSA9006MH/chunks/000001 as 01J8Q9QS886NV5DSPZSA9006MH/chunks/000001: upload s3 object: X-Amz-Server-Side-Encryption-Customer-Algorithm unsupported user defined metadata name",
  "level": "warn",
  "ts": "2024-09-26T14:00:05.547983364Z",
  "uploaded": 0
}

Appriciate every help/advise

jabbrwcky commented 1 month ago

Ok, at some point the bot comments just turn into gaslighting 🤨

jabbrwcky commented 1 month ago

@toporek3112 the underlying issue is here: https://github.com/thanos-io/objstore/issues/141 The objstore module does not send the SSE-C key along in all places where it would be required

jabbrwcky commented 1 month ago

Hey @bwplotka, sorry to interrupt. Could anyone have a quick look at this issue? The PR is already there with tests in the objstore issue, so it should be rather straightforward to fix SSE-C of thanos and other users of the objstore package.