Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
API endpoints for adding and deleting custom storages on organizations
API endpoints for updating primary and/or replica storage for an org
API endpoint to check on progress of background job (currently, only bucket copy jobs are supported)
Automated hooks to copy an organization's files from previous s3 bucket to new and update files in database when primary storage is changed
Automated hooks to replicate content from primary storage to new replica location and update files in the database when a replica location is set on an org
New pylint disable comments on many of the backup modules so that linting passes
Admin documentation for adding, removing, and configuring custom storage locations on an organization
Notes
Currently, no delete operations happen for a a bucket previously used as a primary or replica location that is unset. Files are copied to the new bucket to ensure there are no usability issues moving forward in the app, but the files are not automatically deleted from the source after the copy job. We could add that but I wonder if it's safer, especially in the early days of testing, to perform that cleanup manually as desired.
Once we're comfortable, we can change the rclone command in the copy_job.yaml background job template from copy to move if we want it to automatically clean up files from the source location on completion. Since the same template is used for copying files from an old primary storage to a new primary storage as well as to replicate from primary storage to a new replica location, we'd want to make sure the latter still uses copy so as not to delete files from the primary storage location.
Fixes #578
Adds
Notes
Currently, no delete operations happen for a a bucket previously used as a primary or replica location that is unset. Files are copied to the new bucket to ensure there are no usability issues moving forward in the app, but the files are not automatically deleted from the source after the copy job. We could add that but I wonder if it's safer, especially in the early days of testing, to perform that cleanup manually as desired.
Once we're comfortable, we can change the rclone command in the
copy_job.yaml
background job template fromcopy
tomove
if we want it to automatically clean up files from the source location on completion. Since the same template is used for copying files from an old primary storage to a new primary storage as well as to replicate from primary storage to a new replica location, we'd want to make sure the latter still usescopy
so as not to delete files from the primary storage location.TODO