grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.3k stars 3.38k forks source link

Loki errors when specifying s3 endpoint in its config |WebIdentityErr: failed to retrieve credentials\ncaused by: SerializationError #12095

Open aravindhkudiyarasan opened 6 months ago

aravindhkudiyarasan commented 6 months ago

Describe the bug We are still encountering the error "WebIdentityErr: failed to retrieve credentials\ncaused by: SerializationError:" in Loki when we explicitly specify the S3 endpoint in the Loki configuration. This is because we need to use the FIPS-compliant S3 endpoint when deploying Loki in GovCloud environments. Unfortunately, using the s3 endpoint in Loki config with serviceAccount:true appears to be causing the below error message.

Existing Bugs:- 1) https://github.com/grafana/loki/issues/9131 2) https://github.com/grafana/helm-charts/issues/1550 3) https://github.com/grafana/loki/issues/7403 4) https://community.grafana.com/t/loki-write-pods-does-not-get-into-ready-state/111798

Kindly fix it on emergency.

Loki Config: |

WORKING:-

storageConfig:  
  aws:
    bucketnames: test-loki-cluster-backend,test-loki-cluster-backend-1
    region: us-west-2
    s3forcepathstyle: true
    insecure: false

NOT_WORKING:-


storageConfig: 
  aws:
    bucketnames: test-loki-cluster-backend,test-loki-cluster-backend-1
    region: us-west-2
    endpoint: s3.us-west-2.amazonaws.com
    s3forcepathstyle: true
    insecure: false

To Reproduce Steps to reproduce the behavior:

  1. Deploy distributed Loki using https://github.com/grafana/helm-charts/blob/loki-distributed-0.78.3/charts/loki-distributed/Chart.yaml

Expected behavior Loki Should connect to s3 buckets with specified endpoint.

Environment:

Error:- {"caller":"table_manager.go:143","err":"WebIdentityErr: failed to retrieve credentials\ncaused by: SerializationError: failed to unmarshal error message\n\tstatus code: 405, request id: \ncaused by: UnmarshalError: failed to unmarshal error message\n\t00000000 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31 |.<C|\n00000030 6f 64 65 3e 4d 65 74 68 6f 64 4e 6f 74 41 6c 6c |ode>MethodNotAll|\n00000040 6f 77 65 64 3c 2f 43 6f 64 65 3e 3c 4d 65 73 73 |owed<Mess|\n00000050 61 67 65 3e 54 68 65 20 73 70 65 63 69 66 69 65 |age>The specifie|\n00000060 64 20 6d 65 74 68 6f 64 20 69 73 20 6e 6f 74 20 |d method is not |\n00000070 61 6c 6c 6f 77 65 64 20 61 67 61 69 6e 73 74 20 |allowed against |\n00000080 74 68 69 73 20 72 65 73 6f 75 72 63 65 2e 3c 2f |this resource.</|\n00000090 4d 65 73 73 61 67 65 3e 3c 4d 65 74 68 6f 64 3e |Message>|\n000000a0 50 4f 53 54 3c 2f 4d 65 74 68 6f 64 3e 3c 52 65 |POST<Re|\n000000b0 73 6f 75 72 63 65 54 79 70 65 3e 53 45 52 56 49 |sourceType>SERVI|\n000000c0 43 45 3c 2f 52 65 73 6f 75 72 63 65 54 79 70 65 |CE</ResourceType|\n000000d0 3e 3c 52 65 71 75 65 73 74 49 64 3e 44 51 48 51 |>DQHQ|\n000000e0 52 31 46 48 47 53 48 59 36 45 43 5a 3c 2f 52 65 |R1FHGSHY6ECZ</Re|\n000000f0 71 75 65 73 74 49 64 3e 3c 48 6f 73 74 49 64 3e |questId>|\n00000100 62 53 79 72 63 55 72 44 53 31 48 71 48 76 47 4d |bSyrcUrDS1HqHvGM|\n00000110 39 32 69 62 48 73 58 76 39 4a 4b 57 48 41 66 4d |92ibHsXv9JKWHAfM|\n00000120 41 31 33 67 7a 79 69 4e 76 69 61 38 6e 69 70 76 |A13gzyiNvia8nipv|\n00000130 50 4d 59 6c 74 5a 35 71 6c 69 2b 48 45 43 72 6b |PMYltZ5qli+HECrk|\n00000140 41 78 42 64 4e 69 2f 45 48 57 77 3d 3c 2f 48 6f |AxBdNi/EHWw=</Ho|\n00000150 73 74 49 64 3e 3c 2f 45 72 72 6f 72 3e |stId>|\n\ncaused by: unknown error response tag, {{ Error} []}","index-store":"boltdb-shipper-2023-11-01","level":"error","msg":"failed to upload table","table":"loki_index_backup_19782","ts":"2024-02-29T09:56:11.318328359Z"}

Robsta86 commented 6 months ago

Not sure if it helps with your specific problem in regards with the FIPS-compliant endpoint, however this config works for us:

  storage:
    bucketNames:
      chunks: ${bucket}
      ruler: ${bucket}
      admin: ${bucket}
    type: s3
    s3:
      region: ${aws_region}

  schema_config:
    configs:
    - from: "2024-01-12"
      index:
        period: 24h
        prefix: loki_index_
      store: boltdb-shipper
      object_store: s3
      schema: v12

  storage_config:
    aws:
      s3: s3://${aws_region}/${bucket}
      sse_encryption: true
      sse:
        type: SSE-KMS
        kms_key_id: ${kms_key}
    tsdb_shipper:
      active_index_directory: /var/loki/tsdb-index
      shared_store: s3
aravindhkudiyarasan commented 6 months ago

This config works in our case as well, But which endpoint this uses to connect to s3 by default ?

We are using distributed Loki helm chart to deploy this setup.

kaiyuanlim commented 1 month ago

Hi,

In my case, I want to use a vpc endpoint so that the IPs are somewhat more static and it is easier to maintain the IP whitelisting and netpols in our cluster.