webrecorder / browsertrix

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
https://webrecorder.net/browsertrix
GNU Affero General Public License v3.0
201 stars 35 forks source link

Implement custom storage for orgs #2093

Open tw4l opened 1 month ago

tw4l commented 1 month ago

Fixes #578

Adds

Notes

Currently, no delete operations happen for a a bucket previously used as a primary or replica location that is unset. Files are copied to the new bucket to ensure there are no usability issues moving forward in the app, but the files are not automatically deleted from the source after the copy job. We could add that but I wonder if it's safer, especially in the early days of testing, to perform that cleanup manually as desired.

Once we're comfortable, we can change the rclone command in the copy_job.yaml background job template from copy to move if we want it to automatically clean up files from the source location on completion. Since the same template is used for copying files from an old primary storage to a new primary storage as well as to replicate from primary storage to a new replica location, we'd want to make sure the latter still uses copy so as not to delete files from the primary storage location.

TODO