wmo-im / wis2downloader

The backend Python package for downloading real-time data from the WIS2 network.
Apache License 2.0
0 stars 0 forks source link

wis2-downloader should have a config-option to control amount of downloaded data, to avoid crashing host #13

Closed maaikelimper closed 3 months ago

maaikelimper commented 4 months ago

wis2-downloader can crash host by filling up the disk: propose to add a config-option for "max_mb_download_dir" that can be referenced by the DownloadWorker to control disk-usage:

class DownloadWorker(BaseDownloader):
    def __init__(self, queue: BaseQueue, basepath: str = ".", max_mb_download_dir: float = None):
        self.http = urllib3.PoolManager()
        self.queue = queue
        self.basepath = Path(basepath)
        self.max_mb_download_dir = max_mb_download_dir

    ...

    def mb_in_download_dir(self):
        # calculate the size of the download directory in MB
        mb_in_download_dir = sum(f.stat().st_size for f in self.basepath.glob('**/*') if f.is_file()) / (1024. * 1024.)
        LOGGER.info(f"Download directory size: {mb_in_download_dir} MB")
        return mb_in_download_dir

    def process_job(self, job) -> None:
        if self.max_mb_download_dir is not None:
            if self.mb_in_download_dir() > self.max_mb_download_dir:
                LOGGER.error(f"Download directory size exceeded {self.max_mb_download_dir} MB, can not download {filename}")
                return
        yyyy, mm, dd = get_todays_date()
        output_dir = self.basepath / yyyy / mm / dd

        # Add target to output directory
        output_dir = output_dir / job.get("target", ".")

        ...
maaikelimper commented 3 months ago

implemented as a check on free space on the volume used by the downloader

https://github.com/wmo-im/wis2downloader/pull/18