Loki - Retention and reading of the s3 AWS bucket doesn't work

qdupuy commented 1 year ago

Describe the bug I have log holes despite a 24h index configuration, and reading on the bucket doesn't work

To Reproduce Steps to reproduce the behavior:

Install loki-distributed and promtail into EKS cluster with their helm charts
Creation of an encrypted bucket via KMS
Creation of a Loki user with its API key pair
IAM authorization for the Loki KMS
s3 authorization for the Loki user

Expected behavior

I would like my retention to work in order to get my logs over a period of time that I have defined

Environment:

Infrastructure: EKS v1.22.10
Deployment tool: Helm

My configuration :

Loki :

        chunk_store_config:
          max_look_back_period: 672h

        table_manager:
          retention_deletes_enabled: true
          retention_period: 672h
          poll_interval: 2m

SchemaConfig :

    schemaConfig:
        configs:
            - from: "2020-07-01"
              store: boltdb-shipper
              object_store: aws
              schema: v11
              index:
                prefix: loki_
                period: 24h
              chunks:
                prefix: loki_chunk
                period: 24h

StorageConfig:

    storageConfig:
        boltdb_shipper:
            shared_store: s3
            active_index_directory: /var/loki/index
            cache_location: /var/loki/cache
            cache_ttl: 168h
        filesystem:
            directory: /var/loki/chunks
        aws:
         s3: s3://name
         endpoint: https://s3.eu-west-3.amazonaws.com
         region: eu-west-3
         bucketnames: name
         access_key_id: ${LOKI_S3_ACCESS_KEY_ID}
         secret_access_key: ${LOKI_S3_SECRET_ACCESS_KEY}
         s3forcepathstyle: true
         insecure: false
         sse_encryption: false

chaudum commented 1 year ago

Hi @qdupuy Your schema and storage configs look ok.

However, when you deploy the loki-distributed Helm chart and want to configure retention with boltdb-shipper store, you need to enable the compactor service. You then need to enable retention in the compactor config as well and set a retention period in the limits config.

Example:

compactor:
  retention_enabled: true
limits_config:
  retention_period: 30d

chaudum commented 1 year ago

I have log holes despite a 24h index configuration, and reading on the bucket doesn't work

Could you elaborate a bit more on that and provide error logs, etc so we can narrow down the problem?

qdupuy commented 1 year ago

Hi @qdupuy Your schema and storage configs look ok.

However, when you deploy the loki-distributed Helm chart and want to configure retention with boltdb-shipper store, you need to enable the compactor service. You then need to enable retention in the compactor config as well and set a retention period in the limits config.

Example:
compactor:
  retention_enabled: true
limits_config:
  retention_period: 30d

Compactor :

        compactor:
          shared_store: s3
          retention_enabled: true
          compaction_interval: 10m

I see the problems on grafana.

grafana_loki_datasource

In the logs, I see nothing.

On which service can I get the right information?

qdupuy commented 1 year ago

on the other hand, by putting dynamodb I’ve no more hole and my retention works so weird @chaudum

chaudum commented 1 year ago

Ok, now I understand what you mean with "retention not working". You can only query data that is still on the ingesters and that has not been pushed to object store, right?

Do you see any errors messages in the logs, both on the ingesters (e.g. "failed to flush chunk") and on the queries/query frontend

qdupuy commented 1 year ago

No, the data are sent in the s3 but on the reading I have holes or just a part of the indexes on the 24h.

Since the installation of the dynamodb in addition to the s3 it's good.

I would just like to understand why I had so many problems... Because on AZURE Blob or a minio s3 I've no problem

chaudum commented 1 year ago

No, the data are sent in the s3 but on the reading I have holes or just a part of the indexes on the 24h.

What is your evidence that all indexes and chunks are sent to S3 correctly?

Since the installation of the dynamodb in addition to the s3 it's good.

DynamoDB has absolutely nothing to do with boltdb-shipper, that you have in your schema_config.

I would just like to understand why I had so many problems... Because on AZURE Blob or a minio s3 I've no problem

Without further information I am unable to help. You would need to provide more data, such as log from the ingesters and queriers.

And also metric data from e.g.

loki_ingester_chunks_flushed_total
loki_ingester_chunks_stored_total
loki_ingester_chunk_stored_bytes_total
loki_boltdb_shipper_apply_retention_last_successful_run_timestamp_seconds
loki_boltdb_shipper_loki_boltdb_shipper_tables_upload_operation_total

chaudum commented 1 year ago

I suggest to conduct help in the community forum or in the #loki channel in Grafana's Public Slack.

qdupuy commented 1 year ago

What is your evidence that all indexes and chunks are sent to S3 correctly?

Simply by browsing my bucket, I see all the data

DynamoDB has absolutely nothing to do with boltdb-shipper, that you have in your schema_config.

https://grafana.com/docs/loki/latest/configuration/#storage_config

As soon as I associated a dynamodb in the storage_config configuration by setting the store sharing with aws object, everything works as I want.

As soon as I remove the dynamodb, I find myself with holes in the data with no retention beyond 24h

Without further information I am unable to help. You would need to provide more data, such as log from the ingesters and queriers.

I do the same configuration, except that I change the storage object to put the right environment.

Example :

https://grafana.com/docs/loki/latest/configuration/#azure_storage_config for azure

https://grafana.com/docs/loki/latest/configuration/#s3_storage_config for s3

qdupuy commented 1 year ago

and I already made a thread, people found the problem weird and suggested me to create a issue

qdupuy commented 1 year ago

Hello, 🖖

I made a change on limits_config to increase some values, on the time of deployment and 10 minutes later, I've holes in the event logs.

I have the impression that there is a problem of reading DynamoDB and S3

chaudum commented 1 year ago

and I already made a thread, people found the problem weird and suggested me to create a issue

@qdupuy I am sorry to hear that you are sent around in circles. That was definitely not the intention.

As soon as I associated a dynamodb in the storage_config configuration by setting the store sharing with aws object, everything works as I want.

As soon as I remove the dynamodb, I find myself with holes in the data with no retention beyond 24h

Without further information I am unable to help. You would need to provide more data, such as log from the ingesters and queriers.

I do the same configuration, except that I change the storage object to put the right environment.

Can you pleas post your full schema_config and storage_config of both configurations with and without DynamoDB.

qdupuy commented 1 year ago

Hello,

    schemaConfig:
        configs:
            - from: "2020-07-01"
              store: boltdb-shipper
              object_store: aws
              schema: v11
              index:
                prefix: loki_
                period: 24h
              chunks:
                prefix: loki_chunk
                period: 24h

StorageConfig with AWS S3 :

    storageConfig:
        boltdb_shipper:
            shared_store: s3
            active_index_directory: /var/loki/index
            cache_location: /var/loki/cache
            cache_ttl: 168h
        filesystem:
            directory: /var/loki/chunks
        aws:
         s3: s3://name
         endpoint: https://s3.eu-west-3.amazonaws.com
         region: eu-west-3
         bucketnames: name
         access_key_id: ${LOKI_S3_ACCESS_KEY_ID}
         secret_access_key: ${LOKI_S3_SECRET_ACCESS_KEY}
         s3forcepathstyle: true
         insecure: false
         sse_encryption: false

StorageConfig with AWS DynamoDB :

  storageConfig:
    aws:
      access_key_id: ${LOKI_S3_ACCESS_KEY_ID}
      bucketnames: name
      dynamodb:
        dynamodb_url: ${LOKI_DYNAMODB_URL}
      endpoint: https://s3.eu-west-3.amazonaws.com
      insecure: false
      region: eu-west-3
      s3: s3://name
      s3forcepathstyle: true
      secret_access_key: ${LOKI_S3_SECRET_ACCESS_KEY}
      sse_encryption: false
    boltdb_shipper:
      active_index_directory: /var/loki/index
      cache_location: /var/loki/cache
      cache_ttl: 168h
      shared_store: s3
    filesystem:
      directory: /var/loki/chunks

angelotessaro commented 1 year ago

I have this problem, and it happens if I use env. variables for the AWS credentials. I don't know why this happens, the errors are not clear stating that the problem are with the credentials. If I hard code the credentials instead, it works just fine.

This is my config:

storage_config: boltdb_shipper: active_index_directory: /loki/boltdb-shipper-active cache_location: /loki/boltdb-shipper-cache cache_ttl: 24h
shared_store: aws aws: s3: s3://${PRODLANE_LOKI_AWS_S3_ACCESS_KEY}:${PRODLANE_LOKI_AWS_S3_SECRET_KEY}@eu-central-1 bucketnames: ${PRODLANE_LOKI_AWS_S3_BUCKET_NAME}

grafana / loki

Loki - Retention and reading of the s3 AWS bucket doesn't work #7335