Compactor retention with tsdb_shipper does not work

pyo-counting commented 9 months ago

Describe the bug compactor retention does not work with tsdb shipper, AWS s3 object storage.

To Reproduce Steps to reproduce the behavior:

Install Helm chart with custom value file(ssd-values.yaml)

helm install loki grafana/loki --version 5.41.8 --namespace loki-ns -f ssd-values.yaml

# ssd-values.yaml
loki:
  # -- The number of old ReplicaSets to retain to allow rollback
  revisionHistoryLimit: 2
  # -- Config file contents for Loki
  # @default -- See values.yaml
  config: |
    {{- if .Values.enterprise.enabled}}
    {{- tpl .Values.enterprise.config . }}
    {{- else }}
    auth_enabled: {{ .Values.loki.auth_enabled }}
    {{- end }}

    {{- with .Values.loki.server }}
    server:
      {{- toYaml . | nindent 2}}
    {{- end}}

    memberlist:
    {{- if .Values.loki.memberlistConfig }}
      {{- toYaml .Values.loki.memberlistConfig | nindent 2 }}
    {{- else }}
    {{- if .Values.loki.extraMemberlistConfig}}
    {{- toYaml .Values.loki.extraMemberlistConfig | nindent 2}}
    {{- end }}
      join_members:
        - {{ include "loki.memberlist" . }}
        {{- with .Values.migrate.fromDistributed }}
        {{- if .enabled }}
        - {{ .memberlistService }}
        {{- end }}
        {{- end }}
    {{- end }}

    {{- with .Values.loki.ingester }}
    ingester:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- if .Values.loki.commonConfig}}
    common:
    {{- toYaml .Values.loki.commonConfig | nindent 2}}
      storage:
      {{- include "loki.commonStorageConfig" . | nindent 4}}
    {{- end}}

    {{- with .Values.loki.limits_config }}
    limits_config:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    runtime_config:
      file: /etc/loki/runtime-config/runtime-config.yaml

    {{- with .Values.loki.memcached.chunk_cache }}
    {{- if and .enabled (or .host .addresses) }}
    chunk_store_config:
      chunk_cache_config:
        memcached:
          batch_size: {{ .batch_size }}
          parallelism: {{ .parallelism }}
        memcached_client:
          {{- if .host }}
          host: {{ .host }}
          {{- end }}
          {{- if .addresses }}
          addresses: {{ .addresses }}
          {{- end }}
          service: {{ .service }}
    {{- end }}
    {{- end }}

    {{- if .Values.loki.schemaConfig }}
    schema_config:
    {{- toYaml .Values.loki.schemaConfig | nindent 2}}
    {{- else }}
    schema_config:
      configs:
        - from: 2022-01-11
          store: boltdb-shipper
          object_store: {{ .Values.loki.storage.type }}
          schema: v12
          index:
            prefix: loki_index_
            period: 24h
    {{- end }}

    {{ include "loki.rulerConfig" . }}

    {{- if or .Values.tableManager.retention_deletes_enabled .Values.tableManager.retention_period }}
    table_manager:
      retention_deletes_enabled: {{ .Values.tableManager.retention_deletes_enabled }}
      retention_period: {{ .Values.tableManager.retention_period }}
    {{- end }}

    {{- with .Values.loki.memcached.results_cache }}
    query_range:
      align_queries_with_step: true
      {{- if and .enabled (or .host .addresses) }}
      cache_results: {{ .enabled }}
      results_cache:
        cache:
          default_validity: {{ .default_validity }}
          memcached_client:
            {{- if .host }}
            host: {{ .host }}
            {{- end }}
            {{- if .addresses }}
            addresses: {{ .addresses }}
            {{- end }}
            service: {{ .service }}
            timeout: {{ .timeout }}
      {{- end }}
    {{- end }}

    {{- with .Values.loki.storage_config }}
    storage_config:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.query_scheduler }}
    query_scheduler:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.compactor }}
    compactor:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.analytics }}
    analytics:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.querier }}
    querier:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.index_gateway }}
    index_gateway:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.frontend }}
    frontend:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.frontend_worker }}
    frontend_worker:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.distributor }}
    distributor:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    tracing:
      enabled: {{ .Values.loki.tracing.enabled }}
  # Should authentication be enabled
  auth_enabled: true
  # -- Check https://grafana.com/docs/loki/latest/configuration/#server for more info on the server configuration.
  server:
    log_format: "logfmt"
    log_level: "info"
    log_source_ips_enabled: true
    log_request_headers: true
    log_request_at_info_level_enabled: true
  # -- Limits config
  limits_config:
    max_line_size: 10KB
    per_stream_rate_limit: 5MB
    per_stream_rate_limit_burst: 20MB
    split_queries_by_interval: 15m
    retention_period: 7d
    retention_stream:
      - selector: '{environment="dev"}'
        priority: 1
        period: 1d
      - selector: '{environment="stg"}'
        priority: 1
        period: 2d
    shard_streams:
      enabled: false
    allow_structured_metadata: true
  # -- Provides a reloadable runtime configuration file for some specific configuration
  runtimeConfig: {}
  # -- Check https://grafana.com/docs/loki/latest/configuration/#common_config for more info on how to provide a common     configuration
  commonConfig:
    path_prefix: /var/loki
    replication_factor: 3
    ring:
      kvstore:
        store: "memberlist"
    compactor_address: '{{ include "loki.compactorAddress" . }}'
  # -- Storage config. Providing this will automatically populate all necessary storage configs in the templated config.
  storage:
    bucketNames:
      chunks: kps-shr-tools-s3-loki
      ruler: kps-shr-tools-s3-loki
    type: s3
    s3:
      region: ap-northeast-2
  # -- Configure memcached as an external cache for chunk and results cache. Disabled by default
  # must enable and specify a host for each cache you would like to use.
  memcached:
    chunk_cache:
      enabled: false
    results_cache:
      enabled: false
  # -- Check https://grafana.com/docs/loki/latest/configuration/#schema_config for more info on how to configure schemas
  schemaConfig:
    configs:
      - from: "2024-01-01"
        store: tsdb
        object_store: s3
        schema: v13
        index:
          prefix: tsdb_index_
          period: 24h
  # -- Check https://grafana.com/docs/loki/latest/configuration/#ruler for more info on configuring ruler
  rulerConfig: {}
  # -- Structured loki configuration, takes precedence over `loki.config`, `loki.schemaConfig`, `loki.storageConfig`
  structuredConfig:
    common:
      storage:
        s3:
          storage_class: "STANDARD"
        hedging:
          at: 250ms
          up_to: 3
          max_per_second: 20
    query_range:
      results_cache:
        cache:
          enable_fifocache: false
          embedded_cache:
            enabled: true
            max_size_mb: 150
            ttl: 30m
        compression: "snappy"
      cache_results: true
      cache_index_stats_results: false
  # -- Additional query scheduler config
  query_scheduler:
    max_outstanding_requests_per_tenant: 32768
    querier_forget_delay: 60s
  # -- Additional storage config
  storage_config:
    aws:
      bucketnames: kps-shr-tools-s3-loki
      region: ap-northeast-2
      insecure: false
      storage_class: "STANDARD"
    tsdb_shipper:
      active_index_directory: /var/loki/ingester/tsdb_shipper
      shared_store: "s3"
      shared_store_key_prefix: "tsdb_shipper/"
      cache_location: /var/loki/index_gateway/tsdb_shipper
      index_gateway_client:
        log_gateway_requests: true
  # --  Optional compactor configuration
  compactor:
    working_directory: "/var/loki/compactor"
    shared_store: "s3"
    shared_store_key_prefix: "compoactor/"
    retention_enabled: true
    compactor_ring:
      kvstore:
        store: "memberlist"
  # --  Optional analytics configuration
  analytics:
    reporting_enabled: false
  # --  Optional querier configuration
  querier:
    tail_max_duration: 30m
    max_concurrent: 16
    multi_tenant_queries_enabled: true
  # --  Optional ingester configuration
  ingester:
    lifecycler:
      ring:
        kvstore:
          store: "memberlist"
      final_sleep: 15s
    wal:
      dir: "/var/loki/ingester/wal"
      flush_on_shutdown: true
      replay_memory_ceiling: 1GB
  # --  Optional index gateway configuration
  index_gateway:
    mode: ring
    ring:
      kvstore:
        store: "memberlist"
  frontend:
    scheduler_address: '{{ include "loki.querySchedulerAddress" . }}'
    log_queries_longer_than: 5s
    query_stats_enabled: true
    scheduler_dns_lookup_period: 3s
    compress_responses: true
  frontend_worker:
    match_max_concurrent: true
    scheduler_address: '{{ include "loki.querySchedulerAddress" . }}'
  # -- Optional distributor configuration
  distributor:
    ring:
      kvstore:
        store: "memberlist"
    rate_store:
      debug: true
    write_failures_logging:
      add_insights_label: true
  # -- Enable tracing
  tracing:
    enabled: true
enterprise:
  # Enable enterprise features, license must be provided
  enabled: false

# -- Options that may be necessary when performing a migration from another helm chart
migrate:
  # -- When migrating from a distributed chart like loki-distributed or enterprise-logs
  fromDistributed:
    # -- Set to true if migrating from a distributed helm chart
    enabled: false

serviceAccount:
  # -- Specifies whether a ServiceAccount should be created
  create: true
  # -- The name of the ServiceAccount to use.
  # If not set and create is true, a name is generated using the fullname template
  name: loki-sa
  # -- Annotations for the service account
  annotations:
    eks.amazonaws.com/role-arn: (...skip...)
  # -- Set this toggle to false to opt out of automounting API credentials for the service account
  automountServiceAccountToken: true

# RBAC configuration
rbac:
  # -- If pspEnabled true, a PodSecurityPolicy is created for K8s that use psp.
  pspEnabled: false
  # -- For OpenShift set pspEnabled to 'false' and sccEnabled to 'true' to use the SecurityContextConstraints.
  sccEnabled: false

# -- Section for configuring optional Helm test
test:
  enabled: false

# Monitoring section determines which monitoring features to enable
monitoring:
  # Dashboards for monitoring Loki
  dashboards:
    # -- If enabled, create configmap with dashboards for monitoring Loki
    enabled: false
  # Recording rules for monitoring Loki, required for some dashboards
  rules:
    # -- If enabled, create PrometheusRule resource with Loki recording rules
    enabled: false
    # -- Include alerting rules
    alerting: false
  # ServiceMonitor configuration
  serviceMonitor:
    # -- If enabled, ServiceMonitor resources for Prometheus Operator are created
    enabled: false
  # Self monitoring determines whether Loki should scrape its own logs.
  # This feature currently relies on the Grafana Agent Operator being installed,
  # which is installed by default using the grafana-agent-operator sub-chart.
  # It will create custom resources for GrafanaAgent, LogsInstance, and PodLogs to configure
  # scrape configs to scrape its own logs with the labels expected by the included dashboards.
  selfMonitoring:
    enabled: false
  # The Loki canary pushes logs to and queries from this loki installation to test
  # that it's working correctly
  lokiCanary:
    enabled: false

# Configuration for the write pod(s)
write:
  # -- Number of replicas for the write
  replicas: 3
  autoscaling:
    # -- Enable autoscaling for the write.
    enabled: false
  # -- Comma-separated list of Loki modules to load for the write
  targetModule: "write"
  # -- Resource requests and limits for the write
  resources:
    limits:
      cpu: 1.5
      memory: 2Gi
    requests:
      cpu: 500m
      memory: 500Mi
  # -- Grace period to allow the write to shutdown before it is killed. Especially for the ingester,
  # this must be increased. It must be long enough so writes can be gracefully shutdown flushing/transferring
  # all data and to successfully leave the member ring on shutdown.
  terminationGracePeriodSeconds: 300
  # -- The default is to deploy all pods in parallel.
  podManagementPolicy: "Parallel"
  persistence:
    # -- Enable volume claims in pod spec
    volumeClaimsEnabled: true
    # -- Enable StatefulSetAutoDeletePVC feature
    enableStatefulSetAutoDeletePVC: false
    # -- Storage class to be used.
    # If defined, storageClassName: <storageClass>.
    # If set to "-", storageClassName: "", which disables dynamic provisioning.
    # If empty or set to null, no storageClassName spec is
    # set, choosing the default provisioner (gp2 on AWS, standard on GKE, AWS, and OpenStack).
    storageClass: loki-sc

# Configuration for the table-manager
tableManager:
  # -- Specifies whether the table-manager should be enabled
  enabled: false

# Configuration for the read pod(s)
read:
  # -- Number of replicas for the read
  replicas: 2
  autoscaling:
    # -- Enable autoscaling for the read, this is only used if `queryIndex.enabled: true`
    enabled: false
  # -- Comma-separated list of Loki modules to load for the read
  targetModule: "read"
  # -- Whether or not to use the 2 target type simple scalable mode (read, write) or the
  # 3 target type (read, write, backend). Legacy refers to the 2 target type, so true will
  # run two targets, false will run 3 targets.
  legacyReadTarget: false
  # -- Resource requests and limits for the read
  resources:
    limits:
      cpu: 1.5
      memory: 2Gi
    requests:
      cpu: 500m
      memory: 500Mi
  # -- Grace period to allow the read to shutdown before it is killed
  terminationGracePeriodSeconds: 30

# Configuration for the backend pod(s)
backend:
  # -- Number of replicas for the backend
  replicas: 2
  autoscaling:
    # -- Enable autoscaling for the backend.
    enabled: false
  # -- Comma-separated list of Loki modules to load for the read
  targetModule: "backend"
  # -- Resource requests and limits for the backend
  resources:
    limits:
      cpu: 1
      memory: 1Gi
    requests:
      cpu: 500m
      memory: 500Mi
  # -- Grace period to allow the backend to shutdown before it is killed. Especially for the ingester,
  # this must be increased. It must be long enough so backends can be gracefully shutdown flushing/transferring
  # all data and to successfully leave the member ring on shutdown.
  terminationGracePeriodSeconds: 300
  podManagementPolicy: "Parallel"
  persistence:
    # -- Enable volume claims in pod spec
    volumeClaimsEnabled: true
    # -- Enable StatefulSetAutoDeletePVC feature
    enableStatefulSetAutoDeletePVC: true
    # -- Storage class to be used.
    # If defined, storageClassName: <storageClass>.
    # If set to "-", storageClassName: "", which disables dynamic provisioning.
    # If empty or set to null, no storageClassName spec is
    # set, choosing the default provisioner (gp2 on AWS, standard on GKE, AWS, and OpenStack).
    storageClass: loki-sc
# Configuration for the single binary node(s)
singleBinary:
  # -- Number of replicas for the single binary
  replicas: 0

# Use either this ingress or the gateway, but not both at once.
# If you enable this, make sure to disable the gateway.
# You'll need to supply authn configuration for your ingress controller.
ingress:
  enabled: true
  ingressClassName: "alb"
  annotations:
    (...skip...)
  paths:
    (...skip...)
  hosts:
    (...skip...)

# Configuration for the memberlist service
memberlist:
  service:
    publishNotReadyAddresses: false

# Configuration for the gateway
gateway:
  # -- Specifies whether the gateway should be enabled
  enabled: false

networkPolicy:
  # -- Specifies whether Network Policies should be created
  enabled: false

# -------------------------------------
# Configuration for `minio` child chart
# -------------------------------------
minio:
  enabled: false

sidecar:
  rules:
    # -- Whether or not to create a sidecar to ingest rule from specific ConfigMaps and/or Secrets.
    enabled: false

push logs with Promtail to Loki(stream: {environment="dev" ...}, tenant id: kurlypay)

Expected behavior A clear and concise description of what you expected to happen.

stream matching {environment="dev"} for global retention does not been marked after 1d(reteion period) + 2h(retention_delete_delay)
- saved log timestamp(UTC+0900): 2024-01-26 16:08:11.189+0900
- expected log deletion timestamp: 2024-01-27 18:08:11.189+0900 (retention period + retention delete delay)

Environment:

Infrastructure: EKS(k8s 1.24), s3(using IRSA for object upload/delete... access)
Deployment tool: Helm

Screenshots, Promtail config, or terminal output If applicable, add any output to help explain your problem.

Loki's k8s ConfigMap(config.yaml)

    analytics:
      reporting_enabled: false
    auth_enabled: true
    common:
      compactor_address: 'loki-backend'
      path_prefix: /var/loki
      replication_factor: 3
      ring:
        kvstore:
          store: memberlist
      storage:
        hedging:
          at: 250ms
          max_per_second: 20
          up_to: 3
        s3:
          bucketnames: kps-shr-tools-s3-loki
          insecure: false
          region: ap-northeast-2
          s3forcepathstyle: false
          storage_class: STANDARD
    compactor:
      compactor_ring:
        kvstore:
          store: memberlist
      retention_enabled: true
      shared_store: s3
      shared_store_key_prefix: compoactor/
      working_directory: /var/loki/compactor
    distributor:
      rate_store:
        debug: true
      ring:
        kvstore:
          store: memberlist
      write_failures_logging:
        add_insights_label: true
    frontend:
      compress_responses: true
      log_queries_longer_than: 5s
      query_stats_enabled: true
      scheduler_address: query-scheduler-discovery.loki-ns.svc.cluster.local.:9095
      scheduler_dns_lookup_period: 3s
    frontend_worker:
      match_max_concurrent: true
      scheduler_address: query-scheduler-discovery.loki-ns.svc.cluster.local.:9095
    index_gateway:
      mode: ring
      ring:
        kvstore:
          store: memberlist
    ingester:
      lifecycler:
        final_sleep: 15s
        ring:
          kvstore:
            store: memberlist
      wal:
        dir: /var/loki/ingester/wal
        flush_on_shutdown: true
        replay_memory_ceiling: 1GB
    limits_config:
      allow_structured_metadata: true
      max_cache_freshness_per_query: 10m
      max_line_size: 10KB
      per_stream_rate_limit: 5MB
      per_stream_rate_limit_burst: 20MB
      reject_old_samples: true
      reject_old_samples_max_age: 168h
      retention_period: 7d
      retention_stream:
      - period: 1d
        priority: 1
        selector: '{environment="dev"}'
      - period: 2d
        priority: 1
        selector: '{environment="stg"}'
      shard_streams:
        enabled: false
      split_queries_by_interval: 15m
    memberlist:
      join_members:
      - loki-memberlist
    querier:
      max_concurrent: 16
      multi_tenant_queries_enabled: true
      tail_max_duration: 30m
    query_range:
      align_queries_with_step: true
      cache_index_stats_results: false
      cache_results: true
      results_cache:
        cache:
          embedded_cache:
            enabled: true
            max_size_mb: 150
            ttl: 30m
          enable_fifocache: false
        compression: snappy
    query_scheduler:
      max_outstanding_requests_per_tenant: 32768
      querier_forget_delay: 60s
    ruler:
      storage:
        s3:
          bucketnames: kps-shr-tools-s3-loki
          insecure: false
          region: ap-northeast-2
          s3forcepathstyle: false
        type: s3
    runtime_config:
      file: /etc/loki/runtime-config/runtime-config.yaml
    schema_config:
      configs:
      - from: "2024-01-01"
        index:
          period: 24h
          prefix: tsdb_index_
        object_store: s3
        schema: v13
        store: tsdb
    server:
      grpc_listen_port: 9095
      http_listen_port: 3100
      log_format: logfmt
      log_level: info
      log_request_at_info_level_enabled: true
      log_request_headers: true
      log_source_ips_enabled: true
    storage_config:
      aws:
        bucketnames: kps-shr-tools-s3-loki
        insecure: false
        region: ap-northeast-2
        storage_class: STANDARD
      hedging:
        at: 250ms
        max_per_second: 20
        up_to: 3
      tsdb_shipper:
        active_index_directory: /var/loki/ingester/tsdb_shipper
        cache_location: /var/loki/index_gateway/tsdb_shipper
        index_gateway_client:
          log_gateway_requests: true
        shared_store: s3
        shared_store_key_prefix: tsdb_shipper/
    tracing:
      enabled: true

Grafana explore(queried at 2023-01-28 00:21+0900)
S3

backend target logs output

level=info ts=2024-01-27T07:01:14.865219404Z caller=compactor.go:517 msg="applying retention with compaction"
level=info ts=2024-01-27T07:01:14.865256731Z caller=expiration.go:78 msg="overall smallest retention period 1706252474.865, default smallest retention period 1706252474.865"
ts=2024-01-27T07:01:14.86529407Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:01:14.946262639Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=80.958298ms
level=info ts=2024-01-27T07:02:14.808135172Z caller=marker.go:202 msg="no marks file found"
level=info ts=2024-01-27T07:03:14.807959275Z caller=marker.go:202 msg="no marks file found"
level=info ts=2024-01-27T07:04:14.808085233Z caller=marker.go:202 msg="no marks file found"
level=info ts=2024-01-27T07:05:14.808426886Z caller=marker.go:202 msg="no marks file found"
level=info ts=2024-01-27T07:06:08.655238728Z caller=table_manager.go:228 index-store=tsdb-2024-01-01 msg="syncing tables"
ts=2024-01-27T07:06:08.655315103Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:06:08.705822649Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=50.499015ms
ts=2024-01-27T07:06:08.705869075Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:06:08.72114278Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=15.268517ms
ts=2024-01-27T07:06:08.721188867Z caller=spanlogger.go:86 level=info msg="building table cache"
ts=2024-01-27T07:06:08.741034208Z caller=spanlogger.go:86 level=info msg="table cache built" duration=19.839877ms
ts=2024-01-27T07:06:08.74110291Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:06:08.75919025Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=18.080712ms
ts=2024-01-27T07:06:08.759236094Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:06:08.773159942Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=13.919151ms
ts=2024-01-27T07:06:08.773199339Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:06:08.818762004Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=45.556466ms
ts=2024-01-27T07:06:08.818802868Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:06:08.8361649Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=17.356727ms
ts=2024-01-27T07:06:08.836208945Z caller=spanlogger.go:86 level=info msg="building table cache"
ts=2024-01-27T07:06:08.850882986Z caller=spanlogger.go:86 level=info msg="table cache built" duration=14.668228ms
ts=2024-01-27T07:06:08.8509299Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:06:08.866052312Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=15.118183ms
ts=2024-01-27T07:06:08.866082521Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:06:08.882078327Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=15.990817ms
ts=2024-01-27T07:06:08.882121167Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:06:08.900934931Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=18.809387ms
ts=2024-01-27T07:06:08.900965805Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:06:08.917191332Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=16.221127ms
ts=2024-01-27T07:06:08.917222594Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:06:08.931074256Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=13.843814ms
ts=2024-01-27T07:06:08.931099495Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:06:08.944555033Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=13.451656ms
level=info ts=2024-01-27T07:06:08.944579799Z caller=table_manager.go:271 index-store=tsdb-2024-01-01 msg="query readiness setup completed" duration=2.952µs distinct_users_len=0 distinct_users=
level=info ts=2024-01-27T07:06:14.808627191Z caller=marker.go:202 msg="no marks file found"
level=info ts=2024-01-27T07:07:14.808759172Z caller=marker.go:202 msg="no marks file found"
level=info ts=2024-01-27T07:08:14.807985004Z caller=marker.go:202 msg="no marks file found"
level=info ts=2024-01-27T07:09:14.80859647Z caller=marker.go:202 msg="no marks file found"
level=info ts=2024-01-27T07:10:14.808345602Z caller=marker.go:202 msg="no marks file found"
level=info ts=2024-01-27T07:11:08.655754186Z caller=table_manager.go:228 index-store=tsdb-2024-01-01 msg="syncing tables"
ts=2024-01-27T07:11:08.655826266Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:11:08.708818756Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=52.983678ms
ts=2024-01-27T07:11:08.708862067Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:11:08.725769394Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=16.902595ms
ts=2024-01-27T07:11:08.725807396Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:11:08.74124526Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=15.433144ms
ts=2024-01-27T07:11:08.741285717Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:11:08.756958028Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=15.667322ms
ts=2024-01-27T07:11:08.756997287Z caller=spanlogger.go:86 level=info msg="building table cache"
ts=2024-01-27T07:11:08.773056633Z caller=spanlogger.go:86 level=info msg="table cache built" duration=16.054893ms
ts=2024-01-27T07:11:08.77310801Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:11:08.788781492Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=15.669059ms
ts=2024-01-27T07:11:08.788805885Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:11:08.804863018Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=16.05236ms
ts=2024-01-27T07:11:08.804891816Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:11:08.820865821Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=15.969768ms
ts=2024-01-27T07:11:08.82089105Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:11:08.836186464Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=15.29177ms
ts=2024-01-27T07:11:08.836213075Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:11:08.854480054Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=18.263026ms
ts=2024-01-27T07:11:08.854503319Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:11:08.873903565Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=19.395869ms
ts=2024-01-27T07:11:08.873936021Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:11:08.890227641Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=16.287615ms
ts=2024-01-27T07:11:08.890249099Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:11:08.905204725Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=14.951348ms
ts=2024-01-27T07:11:08.905235933Z caller=spanlogger.go:86 level=info msg="building table cache"
ts=2024-01-27T07:11:08.920680952Z caller=spanlogger.go:86 level=info msg="table cache built" duration=15.43968ms
level=info ts=2024-01-27T07:11:08.920720759Z caller=table_manager.go:271 index-store=tsdb-2024-01-01 msg="query readiness setup     completed" duration=2.59µs distinct_users_len=0 distinct_users=
level=info ts=2024-01-27T07:11:14.808141385Z caller=marker.go:202 msg="no marks file found"
level=info ts=2024-01-27T07:11:14.86529441Z caller=compactor.go:517 msg="applying retention with compaction"
level=info ts=2024-01-27T07:11:14.865334949Z caller=expiration.go:78 msg="overall smallest retention period 1706253074.865,     default smallest retention period 1706253074.865"
ts=2024-01-27T07:11:14.865372091Z caller=spanlogger.go:86 level=info msg="building table names cache"
ts=2024-01-27T07:11:14.914375642Z caller=spanlogger.go:86 level=info msg="table names cache built" duration=48.995438ms
level=info ts=2024-01-27T07:12:14.807707634Z caller=marker.go:202 msg="no marks file found"
level=info ts=2024-01-27T07:13:14.808146732Z caller=marker.go:202 msg="no marks file found"
level=info ts=2024-01-27T07:14:14.808286317Z caller=marker.go:202 msg="no marks file found"
level=info ts=2024-01-27T07:15:14.808442889Z caller=marker.go:202 msg="no marks file found"

pyo-counting commented 9 months ago

I tested below values.yaml file and checked compaction and retention is working.

loki:
  auth_enabled: false
  limits_config:
    retention_period: 1d
  commonConfig:
    replication_factor: 2
  storage:
    bucketNames:
      chunks: kps-shr-tools-s3-loki-test
      ruler: kps-shr-tools-s3-loki-test
    s3:
      region: ap-northeast-2
  storage_config:
    boltdb_shipper:
        active_index_directory: /var/loki/data/index
        cache_location: /var/loki/data/boltdb-cache
        shared_store: s3
  compactor:
    working_directory: /var/loki/data/retention
    shared_store: s3
    retention_delete_delay: 30m
    compaction_interval: 10m
    retention_enabled: true
    retention_delete_worker_count: 150
serviceAccount:
  name: loki-sa
  imagePullSecrets: []
  annotations:
    eks.amazonaws.com/role-arn: (...skip...)
  rules:
    enabled: false
    alerting: false
  serviceMonitor:
    enabled: false
  lokiCanary:
    enabled: false
write:
  replicas: 2
  persistence:
    storageClass: loki-sc
read:
  replicas: 2
  persistence:
    storageClass: loki-sc
backend:
  replicas: 2
  persistence:
    storageClass: loki-sc
gateway:
  enabled: false
extraObjects:
  - apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: loki-sc
    provisioner: efs.csi.aws.com
    parameters:
      provisioningMode: efs-ap
      fileSystemId: (...skip...)
      directoryPerms: "700"
      uid: '{{ .Values.loki.podSecurityContext.runAsUser }}'
      gid: '{{ .Values.loki.podSecurityContext.runAsGroup }}'

What did I miss? Let me know, please

pyo-counting commented 9 months ago

Finally, I found the cause. It's a problem caused by the different value between-tsdb.shipper.shared-store.key-prefix and -compactor.shared-store.key-prefix.

I simply thought the compactor was using -compactor.shared-store.key-prefix flag for deletion purposes not for compact and retention. But it wasn't.

I hope this will be added to the official Loki documentation. There are each option for compactor and writer, so there might be people who have the same misconception as me.

icanhazbeer commented 8 months ago

Hi, Can you elaborate on this? Did you set each value separately?

pyo-counting commented 8 months ago

@icanhazbeer That's right. I set two runtime flag values to different values.

-tsdb.shipper.shared-store.key-prefix
-compactor.shared-store.key-prefix

And the results of the test are as follows.

The log entry deletion request is saved at -compactor.shared-store.key-prefix by the compactor.
The position compactor refers to for compaction and retention is -compactor.shared-store.key-prefix(Before the test, I thought comfactor was referring to -tsdb.shipper.shared-store.key-prefix)

As a result, we can see that the two flags must always have the same value in order for a comacptor to perform a compaction, retention properly.

grafana / loki

Compactor retention with tsdb_shipper does not work #11811