lancachenet / monolithic

A monolithic lancache service capable of caching all CDNs in a single instance
https://hub.docker.com/r/lancachenet/monolithic
Other
726 stars 73 forks source link

Feature request: Include nginx-cache-cleaner in monolithic docker image #126

Closed ghost closed 3 years ago

ghost commented 3 years ago

nginx-cache-cleaner is available on Github: https://github.com/zafergurel/nginx-cache-cleaner

It's a fairly simple install on a standard Nginx install, but I'm unfamiliar with how to get it included in the docker setup.

The tool reverses the hashes used by Nginx so it can list all files cached and offers an easy to to delete specific files in the cache.

This is not practical for Steam games cached files as they are broken up into many small files which are hashed at the upstream CDN, however for Xbox, Windows, and other content delivery systems, this can be useful to delete old versions of games, updates, etc. which are wasting space which will never ever be downloaded again due to being replaced by a newer version. This allows an admin to delete a 100GB game version 1.1.1 when version 1.1.2 is available now. The small update should remain cached, and the new version should be cached, but the old version is a waste of space. If you just Nginx purge the cache when space is getting to it's limits, there's a good chance files you still want cached will be purged prior to the large, out-of-date version cached.

ghost commented 3 years ago

I should also mention that I have not tested this completely yet on Lancache, as I am not sure how to implement in docker. I'm not completely certain how well it would work or if it works with the sharding or if practical. The time and resources to create the index files may be a deal breaker because of all of the many millions of small files from Steam and/or sharding.

I just thought I'd bring this to the attention and see what others/devs think.

VibroAxe commented 3 years ago

we've looked at nginx-cache-cleaner at similar solutions before. The problem it's trying to solve is that you would also need to get the slice / range request correct in order to correctly purge the data.

When running this on a large scale cache the time take to build the indexes would be significant

Interesting idea though. I wonder if we could use ionotify or something to build the indexes

ghost commented 3 years ago

I ended up running the python script on my 2TB (HDD) lancache monolithic last night. It took a few hours to complete, as expected. I did not figure out the web front-end in Nginx, yet.

It's possible to to find all the slices of a file by grepping the created index files for the file name. There are also many many index files, although not very large in size - for my 2TB cache, only about 35 lines in each file, on average.

It did end up helping me perform the goal I set out to accomplish for testing purposes - to delete an out-of-data version of a 140GB Xbox game from the cache. I made a quick script to grep each index file for the file name and delete the output hashed file name of any results. The whole process was quite a bit more complex and time consuming than I had hoped.

This might also be useful to purge stuff based on something different than just a file name - upstream directory, perhaps, as it's part of the hash.

To note, nginx-cache-purge would have had the same result in a single Bash command and would have taken just as long to search each hashed file. With the indexes created, if I chose to delete several files at the same time, it would have been much quicker for the subsequent deletes.

Regarding scan performance - initial index creation would be a dog for hours to days depending on cache size but I don't think it would be a deal breaker IF hourly or daily cache updates can be done efficiently and quickly.....not sure how to do that or if inotify would be useful (I have no idea how that works).

EDIT:

I also forgot that the script has an append mode.

In "append" mode, index file creation dates are checked and only recently added cache files are added to index file. In "create" mode index files are re-created. Default mode is append.

So of course the initial creation will be taxing on resources and take a long time. I can't saw how much the append mode will speed things up. My cache doesn't get much updates being in a home of just a few gamers. I'll give it a try in a few days and see if it's a few minute process or a few hour process. Judging by the instructions of nginx-cache-cleaner, the cron job is supposed to run every 3 minutes, so depending how fast it can check timestamps of tens of thousands of files..........

stale[bot] commented 3 years ago

This issue has been automatically marked as inactive because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

This issue has been automatically closed after being inactive for 30 days. If you require further assistance please reopen the issue with more details or talk to us on discord