grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.61k stars 3.41k forks source link

[helm/loki] Table manager is rely on storage_config section #10612

Open sergeyshaykhullin opened 1 year ago

sergeyshaykhullin commented 1 year ago

Describe the bug Table manager is failing in s3 mode because of unrecognized storage client

To Reproduce Steps to reproduce the behavior:

  1. Started Loki (SHA or version)
  2. Started Promtail (SHA or version) to tail '...'
  3. Query: {} term

Expected behavior

Loki config has common.storage section, seems like table-manager should use this configuration instead

Environment:

Screenshots, Promtail config, or terminal output

...
loki:
  ...
  compactor:
    shared_store: s3
  storage:
    bucketNames:
      chunks: {{ s3.bucket }}
      ruler: {{ s3.bucket }}
      admin: {{ s3.bucket }}
    type: s3
    s3:
      endpoint: {{ s3.endpoint }}
      region: default
      secretAccessKey: {{ s3.secret_key }}
      accessKeyId: {{ s3.access_key }}
...
Unrecognized storage client , choose one of: aws, s3, gcs, azure, alibabacloud, swift, bos, cos, filesystem
error initialising module: table-manager
github.com/grafana/dskit/modules.(*Manager).initModule
    /src/loki/vendor/github.com/grafana/dskit/modules/modules.go:138
github.com/grafana/dskit/modules.(*Manager).InitModuleServices
    /src/loki/vendor/github.com/grafana/dskit/modules/modules.go:108
github.com/grafana/loki/pkg/loki.(*Loki).Run
    /src/loki/pkg/loki/loki.go:461
main.main
    /src/loki/cmd/loki/main.go:110
runtime.main
    /usr/local/go/src/runtime/proc.go:250
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:1598
level=warn ts=2023-09-16T09:47:36.867896922Z caller=loki.go:288 msg="global timeout not configured, using default engine timeout (\"5m0s\"). This behavior will change in the next major to always use the default global timeout (\"5m\")."
level=info ts=2023-09-16T09:47:36.871061139Z caller=main.go:108 msg="Starting Loki" version="(version=2.9.1, branch=HEAD, revision=d9d5ed4a1)"
level=info ts=2023-09-16T09:47:36.871664347Z caller=server.go:322 http=[::]:3100 grpc=[::]:9095 msg="server listening on addresses"
level=warn ts=2023-09-16T09:47:36.873514348Z caller=modules.go:510 msg="table manager is deprecated. Consider migrating to tsdb index which relies on a compactor instead."
level=error ts=2023-09-16T09:47:36.87362744Z caller=log.go:230 msg="error running loki" err="Unrecognized storage client , choose one of: aws, s3, gcs, azure, alibabacloud, swift, bos, cos, filesystem\nerror initialising module: table-manager\ngithub.com/grafana/dskit/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:138\ngithub.com/grafana/dskit/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:108\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:461\nmain.main\n\t/src/loki/cmd/loki/main.go:110\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"
sergeyshaykhullin commented 1 year ago

Seems like table manager is deprecated

Is this a proper way to migrate to tsdb with retention?

...
compactor:
    shared_store: s3
    retention_enabled: true
    retention_delete_delay: {{ retention_period }}
...
schema_config:
    configs:
    - from: "2020-01-01"
      index:
        period: 24h
        prefix: loki_index_
      object_store: s3
      schema: v12
      store: tsdb-shipper
...
Kampe commented 1 year ago

The docs on this while between deprecation are so bad across the helm charts. This needs to be cleared up.

Baskkra commented 1 year ago

Just curious, is it also related to loki write storage access? i got this error level=error ts=2023-10-03T07:27:47.099224717Z caller=flush.go:143 org_id=fake msg="failed to flush" err="failed to flush chunks: store put chunk: googleapi: Error 403: The billing account for the owning project is disabled in state closed, accountDisabled, num_chunks: 1

please let me know if this is not related. will create full separated issues,

AmitBenAmi commented 10 months ago

I am not sure which Helm version you are running, but I saw that on the latest version 0.77.0, which runs Loki version 2.9.2 there is a bug in creating the Table Manager Storage client. In https://github.com/grafana/loki/blob/a17308db6f24874a789bacb4b441b26831d638b5/pkg/storage/factory.go#L549, you can see that if you use a store in your schema_config that specifies tsdb, that it will try to create a new object client (Storage Client) with the shared_store coming from boltdb_shipper (i.e. boltdb-shipper.shared_store).

Since you do not use boltdb-shipper (and so did I), I did not specify that value under storage_configs, hence the error message also shows that it considers the storage to be empty and not necessarily what you specified in your schema_configs.

To fix that, I just specified the next section under my storage_configs and it fixed the issue in table-manager:

# for fixing error in table manager
boltdb_shipper:
  shared_store: s3

I tried to see in main, but it seems that it is already a different fixed code https://github.com/grafana/loki/blob/5e3496739c55eec7e6db013b2dae52efdfc98f30/pkg/storage/factory.go#L575

Just more context - I have decided to upgrade our Loki version from an older one, and I still need table-manager to support older logs until all of my older logs are purged after the retention ends. I'm not entirely sure that this is the correct process, but the documentation seems to be lacking this migration process, so it seems the most intuitive to keep table-manager and the older configs until I can say for sure that there are no more logs that their index is coming from DynamoDB table and only from a single store.