made some progress on s3 work. i wrote a caching store that caches blobs to disk
writing a proper LRU cache is gonna be tricky
might be easier to delete files using an external command or script
i did not test how find will perform on millions of files
but i did make it so the file store can break the blobs up into subdirectories to keep the number of files in each dir small
start with https://github.com/lbryio/reflector.go/blob/master/cmd/getstream.go#L31
so the getstream command downloads a stream from a peer, and also caches the blobs locally
if you delete some of the local blobs, it fills those in
you're gonna want the same thing, except instead of using a PeerStore as the origin, you'll want a DBBackedS3Store (to download straight from s3 instead of going through a peer)
from there its super straightforward. the last coding challenge is managing the size of data on disk. that's either gonna be an external script or it'll go into the DiskStore
relevant information from slack:
possibly useful resources: