grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.8k stars 3.43k forks source link

Include date information in the chunk storage path to enable more manual operations on stored chunks #5611

Open splitice opened 2 years ago

splitice commented 2 years ago

Is your feature request related to a problem? Please describe.

Currently if something ever goes wrong with loki it's incredibly difficult to clear out old unreferenced chunks.

The folder fake which contains the cunks is millions of entries and near impossible to work with with s3 APIs

Describe the solution you'd like

I'd like the option to specify a folder (instead of fake). And for that path to be a date compatible formatting string.

A date compatible formatting string would allow for simple cleanup after given retention period elapses.

Describe alternatives you've considered

Listing all 10M+ files in the folder, deleting by last modification date.

stale[bot] commented 2 years ago

Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.

splitice commented 2 years ago

stalebot persona non grata

slim-bean commented 2 years ago

fake is the name of the tenant when auth is not enabled, when auth is enabled the folder name would be whatever the tenant ID is.

It's an unfortunate naming choice I'm afraid as it confuses and annoys many folks. It's just hard to change without breaking existing installs but we will likely change it in the next major release and follow the path Mimir took where they change the default and then add a config that people can set it back to 'fake' to work with existing data.

the v12 chunk schema does add more layers to the storage, the stream hash is now a folder.

This helps a lot with the per prefix rate limits but doesn't help with a human trying to do anything directly with the stored chunks.

I think it's a pretty reasonable request we've talked about before as well to include some date information in the path, it would enable at least some level of manual operation on the chunks data.

slim-bean commented 2 years ago

(I hijacked your title a little to steer discussions around doing another schema entry which has date information in it)

renatosis commented 2 years ago

I'm trying to recover chunks from S3 DeepArchive and I'm finding it very hard. I ought to search for chunks for modified date using aws api cli and it usually takes 13 minutes with a 10 days logs database. Is there another way doing it?

time aws s3api list-objects-v2 --bucket <bucket> --query 'Contents[?contains(LastModified, "2022-07-26")].Key' --prefix "fake/" --profile <aws_profile>

...

real    13m24,594s
user    2m8,620s
sys 0m3,483s