vtuber-plan / olah

Self-hosted huggingface mirror service.
MIT License
74 stars 5 forks source link

Feature Request: Add Support for Repository Size Limits and Automatic Cache Cleanup #16

Closed andy108369 closed 2 months ago

andy108369 commented 2 months ago

Description: I am managing a production server with limited storage capacity, where I can only allocate a specific amount of data (e.g., 2048 GiB or 2 TiB) for HuggingFace repositories. It would be extremely beneficial if olah could support setting a repository size limit and automatically remove the least accessed cached models/datasets to stay within the allocated storage space.

Proposed Solution: Introduce a feature in olah that allows users to set a maximum repository size. When the limit is reached, olah would automatically delete the least recently accessed cached models/datasets to reclaim space, ensuring that the total storage usage does not exceed the defined limit (e.g., 2 TiB).

Current Behavior: The current version 33e7cf1b30472bc9c9fdd0a71d49093bcccddac8 of olah I am using does not support setting a repository size limit or automatic model cleanup, which forces manual intervention (likely removing the entire repo :warning: ) to manage storage space effectively.

(.venv) olah@node1:~/olah$ pip freeze |grep olah
-e git+https://github.com/vtuber-plan/olah.git@33e7cf1b30472bc9c9fdd0a71d49093bcccddac8#egg=olah

(.venv) olah@node1:~/olah$ python -m olah.server --help
usage: server.py [-h] [--config CONFIG] [--host HOST] [--port PORT] [--hf-scheme HF_SCHEME] [--hf-netloc HF_NETLOC] [--hf-lfs-netloc HF_LFS_NETLOC] [--mirror-scheme MIRROR_SCHEME] [--mirror-netloc MIRROR_NETLOC] [--mirror-lfs-netloc MIRROR_LFS_NETLOC] [--has-lfs-site]
                 [--ssl-key SSL_KEY] [--ssl-cert SSL_CERT] [--repos-path REPOS_PATH] [--log-path LOG_PATH]

Olah Huggingface Mirror Server.

options:
  -h, --help            show this help message and exit
  --config CONFIG, -c CONFIG
  --host HOST
  --port PORT
  --hf-scheme HF_SCHEME
                        The scheme of huggingface site (http or https)
  --hf-netloc HF_NETLOC
  --hf-lfs-netloc HF_LFS_NETLOC
  --mirror-scheme MIRROR_SCHEME
                        The scheme of mirror site (http or https)
  --mirror-netloc MIRROR_NETLOC
  --mirror-lfs-netloc MIRROR_LFS_NETLOC
  --has-lfs-site
  --ssl-key SSL_KEY     The SSL key file path, if HTTPS is used
  --ssl-cert SSL_CERT   The SSL cert file path, if HTTPS is used
  --repos-path REPOS_PATH
                        The folder to save cached repositories
  --log-path LOG_PATH   The folder to save logs
jstzwj commented 2 months ago

The repository cache disk usage limit and file-level cache automatic cleanup have been implemented in the newest version v0.3.0. You can set the capacity limit using --cache-size-limit.