Closed ldko closed 1 year ago
May want to implement in crawler - see https://github.com/webrecorder/browsertrix-crawler/issues/242
This has been implemented in the crawler, which will gracefully stop if disk utilization passes or is projected to pass a certain threshold (90% by default). This threshold is configurable in the crawler and we can make it settable via the helm chart values in Btrix Cloud.
Heritrix has a Disk space monitor that "Monitors the available space on the paths configured. If the available space drops below a specified threshold a crawl pause is requested."
I would find something similar helpful for local deployments of Browsertrix Cloud where if there is limited space left where crawl files are being written, crawls are paused. While the size of crawl content can be configured currently, if a crawl tries to exceed what is actually available and fills the available space 100%, in the case of a deployment where WACZ files are being written to the same place as the microk8s clusters etc., it takes down the whole system.
May be related to #427 .