webrecorder / browsertrix

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
https://webrecorder.net/browsertrix
GNU Affero General Public License v3.0
201 stars 35 forks source link

Delay deletion of replica files #2170

Open tw4l opened 1 day ago

tw4l commented 1 day ago

Overview

Currently, when an archived item or browser profile is deleted in Browsertrix, the deletion cascades to all replica storage locations immediately.

We should implement a configurable delay to replica deletion (based on a setting in the Helm chart) so that deletion of replicas can be delayed by x days. This will give us the ability to recover content (WACZ files, profiles) that is deleted maliciously or through user error so long as it is caught before the delay period expires.

Requirements

Note

This solution covers only files themselves, not the database entries for deleted objects, so without further changes this will give us the ability to recover deleted files but not to restore deleted objects in the application.