grafana / helm-charts

Apache License 2.0
1.61k stars 2.24k forks source link

[loki-distributed] Role of persistence of each component #528

Open WojciechKarpiel opened 3 years ago

WojciechKarpiel commented 3 years ago

Hi! My goal is to prepare a production-ready Loki deployment in Kubernetes environment (HA, retention on logs storage etc). I'm using loki-distributed, because AFAIK it's the only chart that supports HA (correct me if I'm wrong).

There's persistence settings at 4 components: ingester, querier, compactor, ruler. What is role of each of these storages? If I want logs retention (so that disk usage doesn't grow endlessly) should I configure Compactor to remove old data from all of these? How to do that?

If I use cloud-provider-specific storage (GCS, S3) then do I have to create 4 of these aswell? How to configure retention in such case?

Regards and thanks for creating Loki! I'm happy that there's a lightweight alternative to ELK stack :)

shinebayar-g commented 3 years ago

Agree. I'm struggling with same question.

WojciechKarpiel commented 3 years ago

Hi @shinebayar-g. The README states:

NOTE: In its default configuration, the chart uses boltdb-shipper and filesystem as storage. The reason for this is that the chart can be validated and installed in a CI pipeline. However, this setup is not fully functional. Querying will not be possible (or limited to the ingesters' in-memory caches) because that would otherwise require shared storage between ingesters and queriers which the chart does not support and would require a volume that supports ReadWriteMany access mode anyways. The recommendation is to use object storage, such as S3, GCS, MinIO, etc., or one of the other options documented at https://grafana.com/docs/loki/latest/storage/.

I guess that persistence settings is a leftover from attempt to get it working, or a copy-paste error.

Anyway, the chart works with GCP buckets and S3 buckets, so it's functional if you're OK with cloud-provider-dependent solution. (I haven't tried MinIO, it might let you stay cloud-agnostic)

stevehipwell commented 3 years ago

The answers are in the Single Store Loki docs, other than for the compactor.

mehta-ankit commented 2 years ago

Any idea how persistence helps with compactor ?

stevehipwell commented 2 years ago

@mehta-ankit my understanding is that the compactor persistence means that after the pod is re-started it doesn't need to recalculate the whole state of the system from scratch.

The Thanos Compact component is very similar so it's persistence docs should add a bit more context.

mehta-ankit commented 2 years ago

Thanks @stevehipwell

birdx0810 commented 6 months ago

@mehta-ankit my understanding is that the compactor persistence means that after the pod is re-started it doesn't need to recalculate the whole state of the system from scratch.

The Thanos Compact component is very similar so it's persistence docs should add a bit more context.

I believe that Loki Compactor is designed slightly different from Thanos and Mimir。

https://grafana.com/docs/loki/latest/operations/storage/retention/#compactor

According to the docs above, the marked chunks are saved in a file "on disk" and only be deleted after retention_delete_delay, which is works as a delay, allowing components to refresh their store and prevent query issues. Hence, it kinda needs to store a "state".

I might be wrong though. If a Loki expert could chime in and reveal the truth.

emedvesApk commented 2 months ago

I believe that Loki Compactor is designed slightly different from Thanos and Mimir。

https://grafana.com/docs/loki/latest/operations/storage/retention/#compactor

According to the docs above, the marked chunks are saved in a file "on disk" and only be deleted after retention_delete_delay, which is works as a delay, allowing components to refresh their store and prevent query issues. Hence, it kinda needs to store a "state".

I might be wrong though. If a Loki expert could chime in and reveal the truth.

I am analyzing disaster recovery scenarios using the loki-distributed chart, and I am concerned about the compactor's persistence.

From my understanding of this docs: https://grafana.com/docs/loki/latest/operations/storage/retention/#compactor losing the volume only results in losing the "marks" on the chunk that left the retention window, during the last compaction. I hope I'm wrong on this, but it would mean that those chunk will never be deleted. Meanwhile, the index has already been updated by the previous compaction, so the data in those chunks will not be queryable.

I hope someone can correct me, because this scenario is somewhat annoying and requires configuration of retention on the backing storage (S3 in my case).

emedvesApk commented 2 months ago

I opened a new issue to investigate this: #3228