AWS region is set to `dummy`

exb-atix commented 10 months ago

Describe the bug AWS region is not taken into account at least for loki-backend pods when trying to access AWS STS. This is throwing continuously error messages (see log in output section). Apart from these error messages, everything is working as expected though. After looking into the code (as non-native Go speaker) the culprit seems to lie around the lines 224-232 and 245-247 of s3_storage_client.go where the region should be set into the s3Config object.

To Reproduce Steps to reproduce the behavior:

Deploy Loki (2.9.3) via Helm chart (5.41.0)
look into log of a loki-backend pod

Expected behavior The used endpoint is not filled with dummy region and thus doesn't throw an error.

Environment:

Infrastructure: Kubernetes on AWS
Deployment tool: helm via helmfile

Screenshots, Promtail config, or terminal output

Loki log:

level=info ts=2023-12-12T10:44:11.495384847Z caller=loki.go:505 msg="Loki started"
level=error ts=2023-12-12T10:44:11.505356978Z caller=ruler.go:571 msg="unable to list rules" err="WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.dummy.amazonaws.com/\": dial tcp: lookup sts.dummy.amazonaws.com on 172.20.0.10:53: no such host"

Loki helm values:

loki:
  auth_enabled: false
  commonConfig:
    path_prefix: /var/loki
    replication_factor: 3
  compactor:
    apply_retention_interval: 1h
    compaction_interval: 5m
    retention_delete_worker_count: 500
    retention_enabled: true
    shared_store: s3
  schemaConfig:
    configs:
      - from: 2018-04-15
        store: boltdb-shipper
        object_store: s3
        schema: v11
        index:
          prefix: loki_index_
          period: 24h
  server:
    http_listen_port: 3100
  storage_config:
    boltdb_shipper:
      active_index_directory: /var/loki/index
      cache_location: /var/loki/index_cache
      shared_store: s3
    aws:
      bucketnames: {{ .Values.loki.bucket_name }}
      region: {{ .Values.aws.region }}
      s3forcepathstyle: false
serviceAccount:
  create: true
  name: loki
  annotations:
    eks.amazonaws.com/role-arn: {{ .Values.loki.s3_access_role }}

Screenshot of the applied env vars:

envvars

tehlers320 commented 10 months ago

im seeing this as well but on 2.9.1

tehlers320 commented 10 months ago

Sorry im not using loki helm but i figured it out.

The config must have the region set here when using IRSA

    common:
      compactor_address: 'loki'
      path_prefix: /var/loki
      replication_factor: 2
      storage:
        s3:
          bucketnames: {{ .Values.s3_bucket }}
          region: {{ .Values.region }}

Mines working now. Env vars did not matter for whatever reason.

exb-atix commented 10 months ago

Sorry im not using loki helm but i figured it out.

The config must have the region set here when using IRSA
    common:
      compactor_address: 'loki'
      path_prefix: /var/loki
      replication_factor: 2
      storage:
        s3:
          bucketnames: {{ .Values.s3_bucket }}
          region: {{ .Values.region }}
Mines working now. Env vars did not matter for whatever reason.

Thank you for your response. I tried your suggestion and put the storage block into the commonConfig block in my config, but unfortunately the issue is still the same.

AntonioDiTuri commented 9 months ago

I have the same issue

abhivaidya07 commented 9 months ago

I'm facing the same issue any update on this?

0xdnL commented 8 months ago

I had the same error message, deploying via helm and loki as singleBinary. After adding the list-element "ruler: BUCKET_NAME" it disapeared

# values.yaml
loki:
  ..
  storage:
    bucketNames:
      chunks: BUCKET_NAME
      ruler: BUCKET_NAME
    type: s3
    s3:
      s3: s3://BUCKET_NAME
      region: "eu-central-1"
      accessKeyId: "${GRAFANA_LOKI_S3_ACCESKEYID}"
      secretAccessKey: "${GRAFANA_LOKI_S3_SECRETACCESSKEY}"
      s3ForcePathStyle: false
      insecure: false

https://grafana.com/docs/loki/latest/setup/install/helm/install-monolithic/

Which makes sense, since the helm chart's _helpers.tpl is looking for $.Values.loki.storage.bucketNames.ruler

https://github.com/grafana/loki/blob/main/production/helm/loki/templates/_helpers.tpl#L342

exb-atix commented 8 months ago

hello 0xdnL, thank you for this hint. I was now able to test your suggestion, but unfortunately the error persists. This is my change that i tried (among several variations):

commonConfig:
    path_prefix: /var/loki
    replication_factor: 3
    storage:
      bucketNames:
        ruler: {{ .Values.loki.bucket_name }}
        chunks: {{ .Values.loki.bucket_name }}
      type: s3
      s3:
        s3: {{ .Values.loki.bucket_name }}
        region: {{ .Values.aws.region }}
        s3forcepathstyle: false

gitarns commented 7 months ago

Can someone from grafana add a definitive working exemple values.yaml in exemple directory for distributed loki with S3 backend using IRSA ?

exb-atix commented 6 months ago

small update: we upgraded to loki 3.0.0 via helm chart version 6.3.3 and the error still persists.

Loucool111 commented 2 months ago

Can confirm this is still the case as of Chart v6.6.4:

init compactor: failed to init delete store: failed to get s3 object: WebIdentityErr: failed to retrieve credentials
caused by: RequestError: send request failed
caused by: Post "https://sts.dummy.amazonaws.com/": 3 errors occurred:
    * dial tcp: lookup sts.dummy.amazonaws.com on 172.20.0.10:53: no such host
    * dial tcp: lookup sts.dummy.amazonaws.com on 172.20.0.10:53: no such host
    * dial tcp: lookup sts.dummy.amazonaws.com on 172.20.0.10:53: no such host

samschmitt22 commented 1 month ago

This is still an issue in chart v 6.12.0

grafana / loki

AWS region is set to `dummy` #11453