helm / chartmuseum

helm chart repository server
https://chartmuseum.com
Apache License 2.0
3.6k stars 401 forks source link

Auto-purge, get rid of old chart versions #316

Open jdolitsky opened 4 years ago

jdolitsky commented 4 years ago

Add feature flags that enable auto-removal of old chart versions in storage based on various age / last used / version parameters

cep21 commented 3 years ago

I'm interested in this ticket. It's something I could do out of band, but the missing piece is I don't think there's anything in chartmuseum that tracks last used for charts. Does this data currently exist? If we were to add it, where would it be stored?

scbizu commented 3 years ago

@cep21 Happy with your interests , maybe this is what you find ? If you can help us with this feature , it will make a big sense :)

cep21 commented 3 years ago

@scbizu I think "last read" makes more sense than "last modified", right? You would want to remove charts people are no longer downloading, for example.

scbizu commented 3 years ago

@cep21 You're right , but the storage itself do not support the last read timestamp yet . It will be a huge PR if we add a new mechanism to the storage structure , we should handle every REQ from helm pull, and update the last read timestamp.

I think this is why this issue still tag with help wanted XD

cep21 commented 3 years ago

One idea is to store this information inside redis, for example. Another is to use the backend storage itself to store this information and "sync" some kind of ledger every 60 seconds (for example).

scbizu commented 3 years ago

I prefer the second one , the AutoPurge should be provided as an interface function , and it can be differ from the real storage backends. The expiration duration should be configurable if users open the auto purge feature .

cep21 commented 3 years ago

and "sync" some kind of ledger every 60 seconds

This could be tricky if people are running multiple chartmuseum instances for redundancy, since we'll have to merge ledgers

vadasambar commented 3 years ago

It'd be great if we could have more than one conditions to decide whether to delete the charts or not. E.g., Instead of "Delete all the charts older than 2 months", it would be better if we have "Delete all the charts older than 2 months matching a particular regex". This is because you might not want to delete release (e.g., 2.1.0) charts but if you want to delete pre-release charts (e.g., 2.1.0-custom-fix or 2.1.0-pr-3245), you could use regex to match all the pre-release charts older than specified time period (Check #383 ).

However, one thing that concerns me about regex is you'd want to test it out first to see which charts would be deleted with that particular regex to avoid deleting charts you didn't intend to delete.

scbizu commented 3 years ago

The thread is too long to track information now . And let me draw the conclusion till now , the key points list below:

cep21 commented 3 years ago

Above looks right. Bullet (3) is probably an enhancement off the core request: bullet (1). Also the difference between last read and last modified is pretty huge in both use and ease-of-implementation.

vadasambar commented 3 years ago

Optional: should provide a flag to determine whether users need to soft delete the chart. (soft delete here means not really purges the chart but logs the will-be-purged charts)

Nitpick: I think it'd be better to call it dry-run instead. soft delete makes me think that the chart would be archived or removed from the index but the data would still be there but what we want to show using soft delete (as far as I've understood) is what's going to be deleted if you run a delete operation.

Everything else looks good to me @scbizu

scbizu commented 3 years ago

I will be self-assigned to draft one implementation this weekend since I think it will make a big sense for decreasing the pressure of index so that we can both decrease the latency of our APIs and save the disk size of chart storage .

Maybe it will be provides with --per-chart-max-version, it will keep the latest N charts as your configuration. However, since I do not know which chart is currently be used , we can add more stuffs (like stick some charts so that they will not be removed from storage) later .

(The dry-run option is already implemented inside our company maybe I can open source it later)

(off-topic: Our CI failed again because of too large index refreshing XD)

jasondamour commented 1 year ago

Where was this left off? I'm willing to try picking up the remaining work. We have 2.5k charts, and would like to purge as many as possible

scbizu commented 1 year ago

@jasondamour This is already implemented , you can use the version in our HEAD and try the -per-chart-max-version option to start chartmuseum .