Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
Currently, when an archived item or browser profile is deleted in Browsertrix, the deletion cascades to all replica storage locations immediately.
We should implement a configurable delay to replica deletion (based on a setting in the Helm chart) so that deletion of replicas can be delayed by x days. This will give us the ability to recover content (WACZ files, profiles) that is deleted maliciously or through user error so long as it is caught before the delay period expires.
Requirements
[ ] Optional Helm chart setting (calling it replicaDeletionDelayDays for this issue, name may be changed in implementation) for number of days to delay deletion by
[ ] If replicaDeletionDelayDays is not set, start background jobs to delay replicas immediately after deletion, as it is now
[ ] When replicaDeletionDelayDays is set to a valid setting, schedule replica deletion jobs to happen in that number of days
Note
This solution covers only files themselves, not the database entries for deleted objects, so without further changes this will give us the ability to recover deleted files but not to restore deleted objects in the application.
Overview
Currently, when an archived item or browser profile is deleted in Browsertrix, the deletion cascades to all replica storage locations immediately.
We should implement a configurable delay to replica deletion (based on a setting in the Helm chart) so that deletion of replicas can be delayed by x days. This will give us the ability to recover content (WACZ files, profiles) that is deleted maliciously or through user error so long as it is caught before the delay period expires.
Requirements
replicaDeletionDelayDays
for this issue, name may be changed in implementation) for number of days to delay deletion byreplicaDeletionDelayDays
is not set, start background jobs to delay replicas immediately after deletion, as it is nowreplicaDeletionDelayDays
is set to a valid setting, schedule replica deletion jobs to happen in that number of daysNote
This solution covers only files themselves, not the database entries for deleted objects, so without further changes this will give us the ability to recover deleted files but not to restore deleted objects in the application.