Open wzjjack opened 2 years ago
I got the same issue. Logs older than 7 days are deleted and they're not visible in Grafana. Only the chunk files won't be deleted on the filesystem.
loki, version 2.5.0 (branch: HEAD, revision: 2d9d0ee23)
build user: root@4779f4b48f3a
build date: 2022-04-07T21:50:00Z
go version: go1.17.6
platform: linux/amd64
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
path_prefix: /var/lib/loki
storage:
filesystem:
chunks_directory: /var/lib/loki/chunks
rules_directory: /var/lib/loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
chunk_store_config:
max_look_back_period: 168h
table_manager:
retention_deletes_enabled: true
retention_period: 168h
Use compactor not table_manager if you arent using AWS S3.
Use compactor not table_manager if you arent using AWS S3.
Thanks, that did the trick :)
Hey @DeBuXer, could you post what you add to your config file in order to get deletion on s3 done?. I am running the same issue and I have not found the solution.
@Mastedont, I don't use S3, I store my chunks directly on disk. My current configuration:
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
path_prefix: /var/lib/loki
storage:
filesystem:
chunks_directory: /var/lib/loki/chunks
rules_directory: /var/lib/loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
chunk_store_config:
max_look_back_period: 168h
compactor:
working_directory: /var/lib/loki/retention
shared_store: filesystem
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
limits_config:
retention_period: 168h
ruler:
alertmanager_url: http://127.0.0.1:9093
Thank you, @DeBuXer
One last question @DeBuXer.
Do you how can I know if log retention is working or not? What output written in logs will let know thats is working?
@Mastedont, Not 100% sure, but I guess:
Jun 14 15:15:44 loki loki[277929]: level=info ts=2022-06-14T13:15:44.57537489Z caller=index_set.go:280 table-name=index_19150 msg="removing source db files from storage" count=1
Jun 14 15:15:44 loki loki[277929]: level=info ts=2022-06-14T13:15:44.576099223Z caller=compactor.go:495 msg="finished compacting table" table-name=index_19150
That log output is from Ingester?
I only can see outpur like this, despite of having Compactor enabled:
level=info ts=2022-06-14T14:05:17.574831148Z caller=table.go:358 msg="uploading table loki_pre_19157"
level=info ts=2022-06-14T14:05:17.574847901Z caller=table.go:385 msg="finished uploading table loki_pre_19157"
level=info ts=2022-06-14T14:05:17.57485537Z caller=table.go:443 msg="cleaning up unwanted dbs from table loki_pre_19157"
That log output is from Ingester?
From /var/log/syslog but should have the same information. When compactor is enabled, you should see something like;
level=info ts=2022-06-14T14:24:56.072803949Z caller=compactor.go:324 msg="this instance has been chosen to run the compactor, starting compactor"
@DeBuXer , thanks a lot for your support here. I don't see the chunk files getting rotated. I also see pretty old index directories as well. I wanted my logs to be rotated every 7 days. I am not sure what I am doing wrong here. Could you please help me with it?
auth_enabled: false
chunk_store_config:
max_look_back_period: 168h
compactor:
shared_store: filesystem
working_directory: /data/loki/boltdb-shipper-compactor
ingester:
chunk_block_size: 262144
chunk_idle_period: 3m
chunk_retain_period: 1m
wal:
dir: /data/loki/wal
flush_on_shutdown: true
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
max_transfer_retries: 0
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
ingestion_rate_mb: 32
ingestion_burst_size_mb: 36
unordered_writes: true
retention_period: 168h
schema_config:
configs:
- from: 2020-10-24
index:
period: 24h
prefix: index_
object_store: filesystem
schema: v11
store: boltdb-shipper
server:
http_listen_port: 3100
storage_config:
boltdb_shipper:
active_index_directory: /data/loki/boltdb-shipper-active
cache_location: /data/loki/boltdb-shipper-cache
cache_ttl: 24h
shared_store: filesystem
filesystem:
directory: /data/loki/chunks
table_manager:
retention_deletes_enabled: true
retention_period: 168h
@rickydjohn, I think you need to enable retention_enabled
. See also https://grafana.com/docs/loki/latest/operations/storage/retention/#retention-configuration
Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.
We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.
Stalebots are also emotionless and cruel and can close issues which are still very relevant.
If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.
We regularly sort for closed issues which have a stale
label sorted by thumbs up.
We may also:
revivable
if we think it's a valid issue but isn't something we are likely
to prioritize in the future (the issue will still remain closed).keepalive
label to silence the stalebot if the issue is very common/popular/important.We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.
Hi @Mastedont , did you manage to get the chunks deleted from s3? Im having that same problem, that i cannot see any logs about compactor and in s3 store there are older files (>7days) than my configured retention. It seems only the index is cleared, because grafana wont show older log entries.
Hi @Mastedont , did you manage to get the chunks deleted from s3? Im having that same problem, that i cannot see any logs about compactor and in s3 store there are older files (>7days) than my configured retention. It seems only the index is cleared, because grafana wont show older log entries.
I have the same problem
Hi, I have this relevant config:
compactor:
compaction_interval: 10m
retention_delete_delay: 2h
retention_delete_worker_count: 150
retention_enabled: true
shared_store: s3
working_directory: /var/loki/retention
limits_config:
enforce_metric_name: false
max_cache_freshness_per_query: 10m
reject_old_samples: true
reject_old_samples_max_age: 168h
retention_period: 720h
split_queries_by_interval: 30m
but log files are not deleted on S3, only index compacted.
I'm also finding this on loki 2.4.0, using minio as storage. Even with retention_delete_delay: 5m
no chunks are being deleted.
@Mastedont, I don't use S3, I store my chunks directly on disk. My current configuration:
server: http_listen_port: 3100 grpc_listen_port: 9096 common: path_prefix: /var/lib/loki storage: filesystem: chunks_directory: /var/lib/loki/chunks rules_directory: /var/lib/loki/rules replication_factor: 1 ring: instance_addr: 127.0.0.1 kvstore: store: inmemory schema_config: configs: - from: 2020-10-24 store: boltdb-shipper object_store: filesystem schema: v11 index: prefix: index_ period: 24h chunk_store_config: max_look_back_period: 168h compactor: working_directory: /var/lib/loki/retention shared_store: filesystem compaction_interval: 10m retention_enabled: true retention_delete_delay: 2h retention_delete_worker_count: 150 limits_config: retention_period: 168h ruler: alertmanager_url: http://127.0.0.1:9093
Hi,will this configuration clean up expired files in the chunks directory?
any update ?
Judging from the discussion in this issue https://github.com/grafana/loki/issues/7068 I don't think the compactor will delete the chunks in s3 object store you need a bucket lifecycle policy for that.
It would be nice to have a clear answer on this though.
For everyone wondering what's going with the retention, I've tested the feature a lot in the past days so here are what will work.
First of all, you absolutely need those config setup
limits_config:
retention_period: 10d # Keep 10 days
compactor:
delete_request_cancel_period: 10m # don't wait 24h before processing the delete_request
retention_enabled: true # actually do the delete
retention_delete_delay: 2h # wait 2 hours before actually deleting stuff
You can tweak those config to delete faster or slower.
Once you got those config up and running, check that the logs are actually reporting that the retention is being applied : msg="applying retention with compaction"
. The "caller" for this log is compactor.go
.
Next, check that the retention manager is actually doing it's job in the logs: msg="mark file created"
and msg="no marks file found"
from the caller marker.go
.
The mark file created
means that loki did found some chunks to be deleted and it has created a file to keep track of it. The no marks file found
means that while performing the chunk delete routine, there was no file that matched it's filters, the filters mainly being the delay.
Whenever you see the mark file created
logs, you can go into the working directory of the compactor and check for the mark files. The path should be something like /var/loki/compactor/retention/markers
. These files are kept there for 2 hours or whatever is set in retention_delete_delay
. After retention_delete_delay
is passed, loki will delete the chunks.
Not having any of the logs mentionned above means that the retention process is not started.
Loki will only delete chunks that are indexed. The indexes are actually being purged before deleting the chunks. This means that if you lose files from the compactor's working directory, whatever chunks that were marked there won't be deleted ever so it is still worth to have a lifecycle policy to cover for this OR have persistent storage for this particular folder.
@nvanheuverzwijn if I were the CTO of Grafana Labs, I would give you a job offer immediately
@nvanheuverzwijn Thank you a lot! Your explanation makes me clear. The Loki document makes me confuse that Table Manager also deletes chucks when using the filesystem chuck store.
more info about 'Check If It's Working'
such as compaction_interval: 10m
assuming the loki instance start at 2023-07-11T12:30:25.060395045Z
, then there are logs about caller=compactor.go at ts=2023-07-11T12:40:25.047110295Z
level=info ts=2023-07-11T12:30:25.060441045Z caller=compactor.go:440 msg="waiting 10m0s for ring to stay stable and previous compactions to finish before starting compactor" level=info ts=2023-07-11T12:40:25.045542628Z caller=compactor.go:445 msg="compactor startup delay completed" level=info ts=2023-07-11T12:40:25.045568295Z caller=compactor.go:497 msg="compactor started" level=info ts=2023-07-11T12:40:25.04562367Z caller=compactor.go:454 msg="applying retention with compaction" level=info ts=2023-07-11T12:40:25.047110295Z caller=compactor.go:609 msg="compacting table" table-name=loki_index_19549 level=info ts=2023-07-11T12:40:25.047208753Z caller=table_compactor.go:325 table-name=loki_index_19549 msg="using compactor-1689078092.gz as seed file" level=info ts=2023-07-11T12:40:25.048495753Z caller=util.go:85 table-name=loki_index_19549 file-name=compactor-1689078092.gz msg="downloaded file" total_time=1.280041ms level=info ts=2023-07-11T12:40:25.06665592Z caller=compactor.go:614 msg="finished compacting table" table-name=loki_index_19549 level=info ts=2023-07-11T12:40:25.066668503Z caller=compactor.go:609 msg="compacting table" table-name=loki_index_19548 level=info ts=2023-07-11T12:40:25.067591628Z caller=util.go:85 table-name=loki_index_19548 file-name=compactor-1689041382.gz msg="downloaded file" total_time=863.125µs level=info ts=2023-07-11T12:40:25.078401878Z caller=compactor.go:614 msg="finished compacting table" table-name=loki_index_19548
@yangmeilly
you can send full config pls loki.yaml
?
for me not working.
@yangmeilly
you can send full config pls
loki.yaml
?for me not working.
in my scenario, using boltdb-shipper for indexs and filesystem for chunks. the full loki config as following, and bold should have your attention.
compactor block: compaction_interval: 10m delete_request_cancel_period: 2h retention_delete_delay: 2h retention_delete_worker_count: 150 retention_enabled: true shared_store: filesystem working_directory: /var/loki/retention
limits_config block: enforce_metric_name: false max_cache_freshness_per_query: 10m reject_old_samples: true reject_old_samples_max_age: 168h split_queries_by_interval: 15m etention_period: 72h max_query_lookback: 72h
table_manager block: // this make nosense for filesystem retention_deletes_enabled: false retention_period: 0
Whenever you see the
mark file created
logs, you can go into the working directory of the compactor and check for the mark files. The path should be something like/var/loki/compactor/retention/markers
. These files are kept there for 2 hours or whatever is set inretention_delete_delay
. Afterretention_delete_delay
is passed, loki will delete the chunks.Not having any of the logs mentionned above means that the retention process is not started.
@nvanheuverzwijn Thanks for the info. Regarding your statement, loki will delete the chunks, are you talking about a filesystem backend or also a s3/azure backend? I can't find a definitive answer stating that loki is able to delete chunks from external storage.
It will also delete on S3/Azure. I did this with Google Cloud storage but it should be the same for the other backend.
Le mar. 25 juill. 2023, 08 h 55, HammerNL89 @.***> a écrit :
Whenever you see the mark file created logs, you can go into the working directory of the compactor and check for the mark files. The path should be something like /var/loki/compactor/retention/markers. These files are kept there for 2 hours or whatever is set in retention_delete_delay. After retention_delete_delay is passed, loki will delete the chunks.
Not having any of the logs mentionned above means that the retention process is not started.
@nvanheuverzwijn https://github.com/nvanheuverzwijn Thanks for the info. Regarding your statement, loki will delete the chunks, are you talking about a filesystem backend or also a s3/azure backend? I can't find a definitive answer stating that loki is able to delete chunks from external storage.
— Reply to this email directly, view it on GitHub https://github.com/grafana/loki/issues/6300#issuecomment-1649788775, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHGI6RV26I2EEQSLUL2GJ3XR664JANCNFSM5XXPS5YA . You are receiving this because you were mentioned.Message ID: @.***>
@nvanheuverzwijn
compactor
did not delete the chunk. why?
compactor log:
level=info ts=2023-08-03T06:50:12.634846248Z caller=compactor.go:497 msg="compactor started"
level=info ts=2023-08-03T06:50:12.634865722Z caller=compactor.go:454 msg="applying retention with compaction"
level=info ts=2023-08-03T06:50:12.634865349Z caller=marker.go:177 msg="mark processor started" workers=150 delay=2h0m0s
level=info ts=2023-08-03T06:50:12.634955656Z caller=expiration.go:78 msg="overall smallest retention period 1690440612.634, default smallest retention period 1690440612.634"
ts=2023-08-03T06:50:12.635021334Z caller=spanlogger.go:85 level=info msg="building index list cache"
level=info ts=2023-08-03T06:50:12.635046761Z caller=marker.go:202 msg="no marks file found"
config:
storage_config:
aws:
access_key_id: xxxxxx
bucketnames: loki
endpoint: https://s3.xxxx.com
s3forcepathstyle: true
secret_access_key: xxxxx
boltdb_shipper:
active_index_directory: /var/loki/index
cache_location: /var/loki/cache
cache_ttl: 24h
index_gateway_client:
server_address: dns:///loki-distributed-index-gateway:9095
shared_store: s3
compactor:
retention_enabled: true
shared_store: s3
working_directory: /var/loki/compactor
retention_delete_delay: 2h
delete_request_cancel_period: 10m
limits_config:
enforce_metric_name: false
ingestion_burst_size_mb: 1024
ingestion_rate_mb: 1024
max_cache_freshness_per_query: 10m
max_global_streams_per_user: 0
reject_old_samples: true
reject_old_samples_max_age: 168h
retention_period: 1h
split_queries_by_interval: 15m
update: It was caused by my incorrect configuration.
The storage configuration needs to be placed in the common
.
common:
compactor_address: http://loki-distributed-compactor:3100
storage:
s3:
access_key_id: xxxxxx
bucketnames: loki
endpoint: https://s3.xxxx.com
s3forcepathstyle: true
secret_access_key: xxxxxx
@nvanheuverzwijn so beautiful
@stringang can you share the whole loki.yaml?
I am also testing, log files are not deleted from S3, only index compacted,
My whole configuartion
auth_enabled: false
chunk_store_config:
max_look_back_period: 0s
compactor:
shared_store: s3
working_directory: /data/loki/boltdb-shipper-compactor
retention_enabled: true
compaction_interval: 10m
retention_delete_delay: 2h
distributor:
ring:
kvstore:
store: memberlist
ingester:
chunk_block_size: 262144
chunk_idle_period: 3m
chunk_retain_period: 1m
lifecycler:
ring:
kvstore:
store: memberlist
replication_factor: 1
max_transfer_retries: 0
query_store_max_look_back_period: 0
wal:
enabled: true
dir: /data/wal
querier:
max_concurrent: 20
limits_config:
ingestion_rate_mb: 8
ingestion_burst_size_mb: 16
per_stream_rate_limit: 5MB
per_stream_rate_limit_burst: 15MB
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 24h
retention_period: 24h
memberlist:
join_members:
- loki-headless
schema_config:
configs:
- from: "2022-11-20"
store: boltdb-shipper
object_store: s3
schema: v11
index:
period: 24h
prefix: index_
chunks:
period: 24h
prefix: chunks_
server:
http_listen_port: 3100
log_level: debug
storage_config:
object_prefix: test-nyg
boltdb_shipper:
active_index_directory: /data/loki/boltdb-shipper-active
cache_location: /data/loki/boltdb-shipper-cache
cache_ttl: 24h
shared_store: s3
common:
storage:
s3:
s3: s3://test-admin:test-admin@s3Address:10000/$bucketName
s3forcepathstyle: true
Do i miss something?
@ningyougang
auth_enabled: false
chunk_store_config:
max_look_back_period: 0s
common:
compactor_address: http://loki-distributed-compactor:3100
storage:
s3:
access_key_id: xxxxxxxxxxx
bucketnames: loki
endpoint: https://s3.xxxxxx.com
s3forcepathstyle: true
secret_access_key: xxxxxxxxxxx
compactor:
delete_request_cancel_period: 10m
retention_delete_delay: 1h
retention_enabled: true
shared_store: s3
working_directory: /var/loki/compactor
distributor:
ring:
kvstore:
store: memberlist
frontend:
compress_responses: true
log_queries_longer_than: 5s
tail_proxy_url: http://loki-distributed-querier:3100
frontend_worker:
frontend_address: loki-distributed-query-frontend-headless:9095
ingester:
chunk_block_size: 262144
chunk_encoding: snappy
chunk_idle_period: 1h
chunk_retain_period: 1m
chunk_target_size: 8388608
lifecycler:
join_after: 10s
observe_period: 5s
ring:
heartbeat_timeout: 10m
kvstore:
store: memberlist
replication_factor: 3
max_transfer_retries: 0
wal:
dir: /var/loki/wal
ingester_client:
grpc_client_config:
grpc_compression: gzip
limits_config:
enforce_metric_name: false
ingestion_burst_size_mb: 1024
ingestion_rate_mb: 1024
max_cache_freshness_per_query: 10m
max_global_streams_per_user: 0
reject_old_samples: true
reject_old_samples_max_age: 168h
retention_period: 1d
split_queries_by_interval: 15m
memberlist:
join_members:
- loki-distributed-memberlist
query_range:
align_queries_with_step: true
cache_results: true
max_retries: 5
results_cache:
cache:
embedded_cache:
enabled: true
ttl: 24h
ruler:
alertmanager_url: http://am.xxxxxxxxxxx.com/
enable_alertmanager_v2: true
enable_api: true
enable_sharding: true
ring:
kvstore:
store: memberlist
rule_path: /tmp/loki/scratch
storage:
local:
directory: /etc/loki/rules
type: local
runtime_config:
file: /var/loki-distributed-runtime/runtime.yaml
schema_config:
configs:
- from: "2023-08-12"
index:
period: 24h
prefix: loki_index_
object_store: s3
schema: v11
store: boltdb-shipper
server:
grpc_server_max_recv_msg_size: 8388608
http_listen_port: 3100
log_level: debug
storage_config:
boltdb_shipper:
active_index_directory: /var/loki/index
cache_location: /var/loki/cache
cache_ttl: 24h
index_gateway_client:
server_address: dns:///loki-distributed-index-gateway:9095
shared_store: s3
table_manager:
retention_deletes_enabled: false
retention_period: 0s
@nvanheuverzwijn I do see mark file created but then a few mins later the logs show
level=info ts=2023-09-11T05:23:41.019592474Z caller=compactor.go:364 msg="applying retention with compaction"
level=info ts=2023-09-11T05:23:41.020937376Z caller=expiration.go:60 msg="overall smallest retention period 1691817821.02, default smallest retention period 1691817821.02"
level=info ts=2023-09-11T05:23:41.065287615Z caller=marker.go:78 msg="mark file created" file=/data/loki/boltdb-shipper-compactor/retention/markers/1694409821056046054
Few secs later
level=info ts=2023-09-11T05:24:41.019702032Z caller=marker.go:203 msg="no marks file found"
level=info ts=2023-09-11T05:25:41.020103387Z caller=marker.go:203 msg="no marks file found"
level=info ts=2023-09-11T05:26:41.020436386Z caller=marker.go:203 msg="no marks file found"
level=info ts=2023-09-11T05:27:41.020020721Z caller=marker.go:203 msg="no marks file found"
level=info ts=2023-09-11T05:28:41.01973026Z caller=marker.go:203 msg="no marks file found"
level=info ts=2023-09-11T05:30:41.020528317Z caller=marker.go:203 msg="no marks file found"
So looks like even after 5 mins (retention_delete_delay) it still cannot find the mark file. I verified that the mark file exists in that location Below is my loki config
auth_enabled: false
chunk_store_config:
max_look_back_period: 0s
compactor:
delete_request_cancel_period: 10m
retention_delete_delay: 5m
retention_delete_worker_count: 150
retention_enabled: true
shared_store: filesystem
working_directory: /data/loki/boltdb-shipper-compactor
ingester:
chunk_block_size: 262144
chunk_idle_period: 3m
chunk_retain_period: 1m
lifecycler:
ring:
replication_factor: 1
max_transfer_retries: 0
wal:
dir: /data/loki/wal
limits_config:
enforce_metric_name: false
max_entries_limit_per_query: 5000
max_query_lookback: 720h
reject_old_samples: true
reject_old_samples_max_age: 168h
retention_period: 720h
memberlist:
join_members:
- 'loki-memberlist'
schema_config:
configs:
- from: "2020-10-24"
index:
period: 24h
prefix: index_
object_store: filesystem
schema: v11
store: boltdb-shipper
server:
grpc_listen_port: 9095
http_listen_port: 3100
http_server_read_timeout: 120s
This is not working for me even with v2.8.4 version of Grafana Loki
Loki version: v2.8.4 AKS version: 1.23.x Storage backend: azure blob storage
From reading the documenation for compactor, i get the impression that compactor is capable of deletion of old chunks from the blob storage. However, i see that the old chunks of Loki are not getting deleted from the blob stroage even though the necessary configuration is in place. Could somone be kind enough to tell me what could be wrong in my configuration? When i inspect the logs for the compactor, i see that the marker file is being created. I also see a lot of API calls to: GET /loki/api/v1/delete. However, i dont see any POST calls to /loki/api/v1/delete. So this gives me the impression that no deletion is happening now. I confirmed the same that chunks from several months ago are still lying in my blob storage.
auth_enabled: false
server:
http_listen_port: {{ .Values.loki.containerPorts.http }}
log_level: debug
common:
compactor_address: http://{{ include "grafana-loki.compactor.fullname" . }}:{{ .Values.compactor.service.ports.http }}
storage:
azure:
account_name: abc
account_key: abc
container_name: abc
use_managed_identity: false
request_timeout: 0
distributor:
ring:
kvstore:
store: memberlist
memberlist:
join_members:
- {{ include "grafana-loki.gossip-ring.fullname" . }}
ingester:
lifecycler:
ring:
kvstore:
store: memberlist
replication_factor: 1
chunk_idle_period: 2h # Any chunk not receiving new logs in this time will be flushed
chunk_block_size: 262144
chunk_encoding: snappy
chunk_retain_period: 1m
max_chunk_age: 2h # All chunks will be flushed when they hit this age, default is 1h
max_transfer_retries: 0
autoforget_unhealthy: true
wal:
dir: {{ .Values.loki.dataDir }}/wal
limits_config:
retention_period: 48h
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
max_cache_freshness_per_query: 10m
split_queries_by_interval: 15m
per_stream_rate_limit: 10MB
per_stream_rate_limit_burst: 20MB
ingestion_rate_mb: 100
ingestion_burst_size_mb: 30
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: azure
schema: v11
index:
prefix: index_
period: 24h
chunks:
period: 24h
storage_config:
boltdb_shipper:
shared_store: azure
active_index_directory: {{ .Values.loki.dataDir }}/loki/index
cache_location: {{ .Values.loki.dataDir }}/loki/cache
cache_ttl: 168h
{{- if .Values.indexGateway.enabled }}
index_gateway_client:
server_address: {{ (printf "dns:///%s:9095" (include "grafana-loki.index-gateway.fullname" .)) }}
{{- end }}
filesystem:
directory: {{ .Values.loki.dataDir }}/chunks
index_queries_cache_config:
{{- if .Values.memcachedindexqueries.enabled }}
memcached:
batch_size: 100
parallelism: 100
memcached_client:
consistent_hash: true
addresses: dns+{{ include "grafana-loki.memcached-index-queries.host" . }}
service: http
{{- end }}
chunk_store_config:
max_look_back_period: 2d
{{- if .Values.memcachedchunks.enabled }}
chunk_cache_config:
memcached:
batch_size: 100
parallelism: 100
memcached_client:
consistent_hash: true
addresses: dns+{{ include "grafana-loki.memcached-chunks.host" . }}
{{- end }}
{{- if .Values.memcachedindexwrites.enabled }}
write_dedupe_cache_config:
memcached:
batch_size: 100
parallelism: 100
memcached_client:
consistent_hash: true
addresses: dns+{{ include "grafana-loki.memcached-index-writes.host" . }}
{{- end }}
table_manager:
retention_deletes_enabled: true
retention_period: 2d
query_range:
align_queries_with_step: true
max_retries: 5
cache_results: true
results_cache:
cache:
{{- if .Values.memcachedfrontend.enabled }}
memcached_client:
consistent_hash: true
addresses: dns+{{ include "grafana-loki.memcached-frontend.host" . }}
max_idle_conns: 16
timeout: 500ms
update_interval: 1m
{{- else }}
enable_fifocache: true
fifocache:
max_size_items: 1024
validity: 24h
{{- end }}
{{- if not .Values.queryScheduler.enabled }}
frontend_worker:
frontend_address: {{ include "grafana-loki.query-frontend.fullname" . }}:{{ .Values.queryFrontend.service.ports.grpc }}
{{- end }}
frontend:
log_queries_longer_than: 5s
compress_responses: true
tail_proxy_url: http://{{ include "grafana-loki.querier.fullname" . }}:{{ .Values.querier.service.ports.http }}
compactor:
working_directory: {{ .Values.loki.dataDir }}/retention
shared_store: azure
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
ruler:
storage:
type: local
local:
directory: {{ .Values.loki.dataDir }}/conf/rules
ring:
kvstore:
store: memberlist
rule_path: /tmp/loki/scratch
alertmanager_url: http://abc.bdc.com/alertmanager
external_url: https://abc.bdc.com/alertmanager
Hi,
I had a similar issue with recent 2.9.1 version of loki. It appears that recently there has been some work with the property deletion_mode
(https://grafana.com/docs/loki/latest/operations/storage/logs-deletion/). This property is now a per tenant configurable within the runtime config, the default value is not documented but I had to enforce filter-and-delete
mode and it started to delete chunks from my object storage.
The helm values config for that setting is:
loki:
runtimeConfig:
overrides:
fake:
deletion_mode: filter-and-delete
I hope this solves your issues.
As far as I understood from https://github.com/grafana/loki/issues/7068#issuecomment-1347131945 chunks are never going to get deleted from s3 by the compactor. It is not possible to configure retention so that the compactor actually deletes the chunks from S3. This needs to be done via lifecyclePolicy or some other mechanism. What the compactor will do is manage the chunk/index relationship so that you dont receive issues for "deleted chunks/indexes" when running queries. I guess it will actually delete references to chunks in the indexes but not the chunk files themselves. Am I right?? It's a bit conter-intuitive and would help if this is clarified in the documentation.
@Mastedont, I don't use S3, I store my chunks directly on disk. My current configuration:
server: http_listen_port: 3100 grpc_listen_port: 9096 common: path_prefix: /var/lib/loki storage: filesystem: chunks_directory: /var/lib/loki/chunks rules_directory: /var/lib/loki/rules replication_factor: 1 ring: instance_addr: 127.0.0.1 kvstore: store: inmemory schema_config: configs: - from: 2020-10-24 store: boltdb-shipper object_store: filesystem schema: v11 index: prefix: index_ period: 24h chunk_store_config: max_look_back_period: 168h compactor: working_directory: /var/lib/loki/retention shared_store: filesystem compaction_interval: 10m retention_enabled: true retention_delete_delay: 2h retention_delete_worker_count: 150 limits_config: retention_period: 168h ruler: alertmanager_url: http://127.0.0.1:9093
mark
For everyone wondering what's going with the retention, I've tested the feature a lot in the past days so here are what will work.
Minimal Configuration Needed
First of all, you absolutely need those config setup
limits_config: retention_period: 10d # Keep 10 days compactor: delete_request_cancel_period: 10m # don't wait 24h before processing the delete_request retention_enabled: true # actually do the delete retention_delete_delay: 2h # wait 2 hours before actually deleting stuff
You can tweak those config to delete faster or slower.
Check If It's Working
Once you got those config up and running, check that the logs are actually reporting that the retention is being applied :
msg="applying retention with compaction"
. The "caller" for this log iscompactor.go
.Next, check that the retention manager is actually doing it's job in the logs:
msg="mark file created"
andmsg="no marks file found"
from the callermarker.go
.The
mark file created
means that loki did found some chunks to be deleted and it has created a file to keep track of it. Theno marks file found
means that while performing the chunk delete routine, there was no file that matched it's filters, the filters mainly being the delay.Whenever you see the
mark file created
logs, you can go into the working directory of the compactor and check for the mark files. The path should be something like/var/loki/compactor/retention/markers
. These files are kept there for 2 hours or whatever is set inretention_delete_delay
. Afterretention_delete_delay
is passed, loki will delete the chunks.Not having any of the logs mentionned above means that the retention process is not started.
Important Notes
Loki will only delete chunks that are indexed. The indexes are actually being purged before deleting the chunks. This means that if you lose files from the compactor's working directory, whatever chunks that were marked there won't be deleted ever so it is still worth to have a lifecycle policy to cover for this OR have persistent storage for this particular folder.
@nvanheuverzwijn learn a lot, how can i know which chunks are indexed?
I got the same issue. Logs older than 7 days are deleted and they're not visible in Grafana. Only the chunk files won't be deleted on the filesystem.
loki, version 2.5.0 (branch: HEAD, revision: 2d9d0ee23) build user: root@4779f4b48f3a build date: 2022-04-07T21:50:00Z go version: go1.17.6 platform: linux/amd64
server: http_listen_port: 3100 grpc_listen_port: 9096 common: path_prefix: /var/lib/loki storage: filesystem: chunks_directory: /var/lib/loki/chunks rules_directory: /var/lib/loki/rules replication_factor: 1 ring: instance_addr: 127.0.0.1 kvstore: store: inmemory schema_config: configs: - from: 2020-10-24 store: boltdb-shipper object_store: filesystem schema: v11 index: prefix: index_ period: 24h chunk_store_config: max_look_back_period: 168h table_manager: retention_deletes_enabled: true retention_period: 168h
Hi, i used the same file, loki is up and running but retention is not working.
permissions are also fine.
level=info ts=2024-05-30T07:12:05.009458014Z caller=metrics.go:159 component=frontend org_id=fake traceID=6cc7b11248ffd6f2 latency=fast query="sum by (level) (count_over_time({job=\"containerlogs\"} | drop error[1s]))" query_hash=1978880347 query_type=metric range_type=range length=5m1s start_delta=5m2.009446978s end_delta=1.009448479s step=1s duration=930.872366ms status=200 limit=100 returned_lines=0 throughput=12kB total_bytes=11kB total_bytes_structured_metadata=0B lines_per_second=82 total_lines=77 post_filter_lines=77 total_entries=1 store_chunks_download_time=0s queue_time=100.027997ms splits=0 shards=16 cache_chunk_req=0 cache_chunk_hit=0 cache_chunk_bytes_stored=0 cache_chunk_bytes_fetched=0 cache_chunk_download_time=0s cache_index_req=0 cache_index_hit=0 cache_index_download_time=0s cache_stats_results_req=0 cache_stats_results_hit=0 cache_stats_results_download_time=0s cache_result_req=0 cache_result_hit=0 cache_result_download_time=0s source=logvolhist level=info ts=2024-05-30T07:12:05.180281106Z caller=roundtrip.go:241 org_id=fake traceID=7306103002fd206d msg="executing query" type=range query="{job=\"containerlogs\"}" length=5m0s step=5m0s query_hash=3605303691 level=info ts=2024-05-30T07:12:05.180937762Z caller=engine.go:234 component=querier org_id=fake traceID=7306103002fd206d msg="executing query" type=range query="{job=\"containerlogs\"}" length=5m0s step=5m0s query_hash=3605303691
For everyone wondering what's going with the retention, I've tested the feature a lot in the past days so here are what will work.
Minimal Configuration Needed
First of all, you absolutely need those config setup
limits_config: retention_period: 10d # Keep 10 days compactor: delete_request_cancel_period: 10m # don't wait 24h before processing the delete_request retention_enabled: true # actually do the delete retention_delete_delay: 2h # wait 2 hours before actually deleting stuff
You can tweak those config to delete faster or slower.
Check If It's Working
Once you got those config up and running, check that the logs are actually reporting that the retention is being applied :
msg="applying retention with compaction"
. The "caller" for this log iscompactor.go
. Next, check that the retention manager is actually doing it's job in the logs:msg="mark file created"
andmsg="no marks file found"
from the callermarker.go
. Themark file created
means that loki did found some chunks to be deleted and it has created a file to keep track of it. Theno marks file found
means that while performing the chunk delete routine, there was no file that matched it's filters, the filters mainly being the delay. Whenever you see themark file created
logs, you can go into the working directory of the compactor and check for the mark files. The path should be something like/var/loki/compactor/retention/markers
. These files are kept there for 2 hours or whatever is set inretention_delete_delay
. Afterretention_delete_delay
is passed, loki will delete the chunks. Not having any of the logs mentionned above means that the retention process is not started.Important Notes
Loki will only delete chunks that are indexed. The indexes are actually being purged before deleting the chunks. This means that if you lose files from the compactor's working directory, whatever chunks that were marked there won't be deleted ever so it is still worth to have a lifecycle policy to cover for this OR have persistent storage for this particular folder.
@nvanheuverzwijn learn a lot, how can i know which chunks are indexed?
The above article by @nvanheuverzwijn is so on point. Thank you! There are a lot of docs of setting Lifecycle Policies as the "only" way to purge, and those are outdated. This setup seems to be working fine.
In the compactor
or the backend
pod, the working_dir
/retention/storage
/markers folder will have these marker files.
/var/loki/compactor/retention/aws/markers $ ls -la
total 312
drwxr-sr-x 2 loki loki 4096 Aug 8 19:50 .
drwxr-sr-x 3 loki loki 4096 Aug 7 19:59 ..
-rw-r--r-- 1 loki loki 32768 Aug 8 18:09 1723140578748104859
-rw-r--r-- 1 loki loki 32768 Aug 8 18:19 1723141178696907357
...
-rw-r--r-- 1 loki loki 32768 Aug 8 19:49 1723146578638477113
-rw-r--r-- 1 loki loki 32768 Aug 8 19:59 1723147178354528360
If the compactor cycle is running every 10 minutes (default), you will see a new file created and the oldest file processed and deleted every 10 minutes. And in S3 or your storage providers these objects no longer exist!
To find out what objects are about to be deleted, use the strings
or od -c
on the market file. So doing this on the oldest marker file will tell us about the next-in-line objects that are purged
/var/loki/compactor/retention/aws/markers $ strings 1723140578748104859
chunks
fake/65ff478cf0700326:190eb0ab615:190eb125142:ee83e028
fake/36bfb27dc259bf11:190eaa1db48:190eb0fef22:7101872a
fake/eecc8a1a1bba16ea:190eb0a8a31:190eb122448:156be59e
fake/dd094d6bff631bd5:190eaa581d0:190eb13688f:62c6862c
fake/f78ea3e78078c811:190eaa51fb0:190eb12fcfd:41bb1a13
fake/225757be61ef1d7a:190eaa50dad:190eb12f961:48b0c78c
fake/f4a7b9aef3c86224:190eb0c4cb2:190eb13a35e:a2919ec1
aws s3api
command as below to list the oldest object.. You should not see any objects older than your retention_period if all went per planfake/36bfb27dc259bf1...
, and this was in the output of the previous strings 1723140578748104859
command (2nd of the fake/... objects). 10 minutes later you will see this deleted. This tallies correctly with the 14d retention period.$ aws s3api list-objects-v2 --bucket dframe-loki- --prefix fake --query 'sort_by(Contents, &LastModified)[0]' --output json
{
"Key": "fake/36bfb27dc259bf11:190eaa1db48:190eb0fef22:7101872a",
"LastModified": "2024-07-25T18:03:45.000Z",
"ETag": "\"606ec603a5cb748a96a3ea3e4fb11b87\"",
"Size": 7000,
"StorageClass": "STANDARD"
}
This ticket can be closed, not sure why it is still open...
Hi, I have below conf in my helm override file for having retention period of 24hr, but still i see old index file in S3. I am using loki-stack-2.10.2 chart. Any idea what i am missing here?
loki:
serviceAccount:
name: loki-service-account
create: false
config:
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: s3
schema: v11
index:
prefix: loki_index_
period: 24h
storage_config:
aws:
s3: s3://aws_region/s3-bucket
s3forcepathstyle: true
bucketnames: s3-bucket
region: aws_region
insecure: false
sse_encryption: false
boltdb_shipper:
shared_store: s3
cache_ttl: 24h
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 24h
max_entries_limit_per_query: 5000
retention_period: 24h
max_query_lookback: 24h
compactor:
working_directory: /data/loki/boltdb-shipper-compactor
shared_store: filesystem
retention_enabled: true
delete_request_cancel_period: 24h
retention_delete_delay: 2h
retention_delete_worker_count: 150
compaction_interval: 24h
table_manager:
retention_deletes_enabled: true
retention_period: 24h
Here's my loki-distributed
chart configuration backed by S3 storage. Need to set delete_request_store
in compactor config, too.
[!NOTE]
compactor.delete_request_store
should be set to configure the store for delete requests. This is required when retention is enabled. See loki's retention doc.
# charts/loki-distributed/values.yaml
loki:
config: |
compactor:
shared_store: s3
working_directory: /var/loki/compactor
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
delete_request_store: s3
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
max_cache_freshness_per_query: 10m
split_queries_by_interval: 15m
retention_period: 7d
ingestion_rate_mb: 20
ingestion_burst_size_mb: 30
Describe the bug I've configured 168h retention for my logs, but I can see chunks 5 years old filling my disk
To Reproduce
this is my config
Expected behavior Chunks older than 168h should be deleted.
Environment:
Screenshots, Promtail config, or terminal output We can see 49 days of logs although I've configured 168h