home-assistant / addons

:heavy_plus_sign: Docker add-ons for Home Assistant
https://home-assistant.io/hassio/
Apache License 2.0
1.46k stars 1.43k forks source link

During a total backup, it saves Whisper's models files, raising the backup from 440MB to 1730MB #3545

Open SaintTDI opened 2 months ago

SaintTDI commented 2 months ago

Describe the issue you are experiencing

Yesterday I installed a local voice assist pipeline. Before Installing it each full backup (automatically done by Onedrive backup) was about 440MB... but since I installed whisper, piper and openwakeword, yesterday the backup was 1gb and this morning is 1,6gb.

Doing a partial backup of Whisper, only for this addon is 1266MB, and unzipping the file I can see the 3 models that I tried with some big files (eg path: core_whisper\data\models--rhasspy--faster-whisper-medium-int8\blobs).

On the Whisper add-on documentation it says:

Backups Whisper model files can be quite large, so they are automatically excluded from backups. The models will be re-downloaded when the backup is restored.

But it seems it doesn't happen.

What type of installation are you running?

Home Assistant OS

Which operating system are you running on?

Home Assistant Operating System

Which add-on are you reporting an issue with?

Whisper

What is the version of the add-on?

2.0.0

Steps to reproduce the issue

  1. Install whisper
  2. change different models
  3. perform a total backup
  4. unzip the file and find the blobs for each model ...

System Health information

System Information

version core-2024.3.3
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.12.2
os_name Linux
os_version 6.6.20-haos
arch x86_64
timezone Europe/Rome
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 5000 Installed Version | 1.34.0 Stage | running Available Repositories | 1402 Downloaded Repositories | 35
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 12.1 -- | -- update_channel | stable supervisor_version | supervisor-2024.03.1 agent_version | 1.6.0 docker_version | 24.0.7 disk_total | 468.7 GB disk_used | 27.1 GB healthy | true supported | true board | generic-x86-64 supervisor_api | ok version_api | ok installed_addons | File editor (5.8.0), Advanced SSH & Web Terminal (17.2.0), Mosquitto broker (6.4.0), Zigbee2MQTT (1.36.1-1), Studio Code Server (5.15.0), Duck DNS (1.16.0), OneDrive Backup (2.3.1), Grocy (0.21.0), ESPHome (2024.3.1), Piper (1.5.0), Whisper (2.0.0), openWakeWord (1.10.0)
Dashboards dashboards | 7 -- | -- resources | 26 views | 52 mode | storage
Recorder oldest_recorder_run | 20 marzo 2024 alle ore 13:28 -- | -- current_recorder_run | 3 aprile 2024 alle ore 15:23 estimated_db_size | 996.68 MiB database_engine | sqlite database_version | 3.44.2
Spotify api_endpoint_reachable | ok -- | --

Anything in the Supervisor logs that might be useful for us?

2024-04-04 11:21:17.572 INFO (MainThread) [supervisor.api.middleware.security] /backups access from de91e161_hassio_onedrive_backup
2024-04-04 11:21:18.008 INFO (MainThread) [supervisor.api.middleware.security] /backups access from de91e161_hassio_onedrive_backup
2024-04-04 11:22:42.609 INFO (MainThread) [supervisor.resolution.check] Starting system checks with state running
2024-04-04 11:22:42.609 INFO (MainThread) [supervisor.resolution.checks.base] Run check for security/core
2024-04-04 11:22:42.610 INFO (MainThread) [supervisor.resolution.checks.base] Run check for no_current_backup/system
2024-04-04 11:22:42.610 INFO (MainThread) [supervisor.resolution.checks.base] Run check for trust/supervisor
2024-04-04 11:22:42.617 INFO (MainThread) [supervisor.resolution.checks.base] Run check for dns_server_failed/dns_server
2024-04-04 11:22:42.670 INFO (MainThread) [supervisor.resolution.checks.base] Run check for pwned/addon
2024-04-04 11:22:42.670 INFO (MainThread) [supervisor.resolution.checks.base] Run check for multiple_data_disks/system
2024-04-04 11:22:42.671 INFO (MainThread) [supervisor.resolution.checks.base] Run check for dns_server_ipv6_error/dns_server
2024-04-04 11:22:42.671 INFO (MainThread) [supervisor.resolution.checks.base] Run check for docker_config/system
2024-04-04 11:22:42.671 INFO (MainThread) [supervisor.resolution.checks.base] Run check for free_space/system
2024-04-04 11:22:42.671 INFO (MainThread) [supervisor.resolution.checks.base] Run check for ipv4_connection_problem/system
2024-04-04 11:22:42.672 INFO (MainThread) [supervisor.resolution.check] System checks complete
2024-04-04 11:22:42.672 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state running
2024-04-04 11:22:42.754 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
2024-04-04 11:22:42.754 INFO (MainThread) [supervisor.resolution.fixup] Starting system autofix at state running
2024-04-04 11:22:42.754 INFO (MainThread) [supervisor.resolution.fixup] System autofix complete
2024-04-04 11:23:18.455 WARNING (MainThread) [supervisor.addons.options] Unknown option 'serial' for Zigbee2MQTT (45df7312_zigbee2mqtt)
2024-04-04 11:23:18.455 WARNING (MainThread) [supervisor.addons.options] Unknown option 'advanced' for Zigbee2MQTT (45df7312_zigbee2mqtt)
2024-04-04 11:23:44.647 INFO (MainThread) [supervisor.backups.manager] Found 45 backup files
2024-04-04 11:23:44.713 INFO (MainThread) [supervisor.backups.manager] Found 45 backup files
2024-04-04 11:23:50.924 INFO (MainThread) [supervisor.backups.manager] Backup 8e7bb872 starting stage addon_repositories
2024-04-04 11:23:50.925 INFO (MainThread) [supervisor.backups.manager] Backup 8e7bb872 starting stage docker_config
2024-04-04 11:23:50.925 INFO (MainThread) [supervisor.backups.manager] Creating new full backup with slug 8e7bb872
2024-04-04 11:23:50.928 INFO (MainThread) [supervisor.backups.manager] Backup 8e7bb872 starting stage addons
2024-04-04 11:23:50.944 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on core_configurator
2024-04-04 11:23:50.949 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon core_configurator
2024-04-04 11:23:50.956 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on a0d7b954_ssh
2024-04-04 11:23:50.963 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon a0d7b954_ssh
2024-04-04 11:23:50.970 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on core_mosquitto
2024-04-04 11:23:50.975 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon core_mosquitto
2024-04-04 11:23:50.983 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on 45df7312_zigbee2mqtt
2024-04-04 11:23:50.987 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon 45df7312_zigbee2mqtt
2024-04-04 11:23:50.995 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on a0d7b954_vscode
2024-04-04 11:23:52.111 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon a0d7b954_vscode
2024-04-04 11:23:52.119 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on core_duckdns
2024-04-04 11:23:52.137 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon core_duckdns
2024-04-04 11:23:52.147 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on de91e161_hassio_onedrive_backup
2024-04-04 11:23:52.152 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon de91e161_hassio_onedrive_backup
2024-04-04 11:23:52.160 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on a0d7b954_grocy
2024-04-04 11:23:52.191 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon a0d7b954_grocy
2024-04-04 11:23:52.200 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on 5c53de3b_esphome
2024-04-04 11:23:52.203 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon 5c53de3b_esphome
2024-04-04 11:23:52.211 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on core_piper
2024-04-04 11:23:52.216 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon core_piper
2024-04-04 11:23:52.223 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on core_whisper
2024-04-04 11:24:06.122 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon core_whisper
2024-04-04 11:24:06.130 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on core_openwakeword
2024-04-04 11:24:06.135 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon core_openwakeword
2024-04-04 11:24:06.135 INFO (MainThread) [supervisor.backups.manager] Backup 8e7bb872 starting stage home_assistant
2024-04-04 11:24:06.143 INFO (MainThread) [supervisor.homeassistant.module] Backing up Home Assistant Core config folder
2024-04-04 11:24:17.050 INFO (MainThread) [supervisor.homeassistant.module] Backup Home Assistant Core config folder done
2024-04-04 11:24:17.057 INFO (MainThread) [supervisor.backups.manager] Backup 8e7bb872 starting stage folders
2024-04-04 11:24:17.058 INFO (SyncWorker_3) [supervisor.backups.backup] Backing up folder share
2024-04-04 11:24:17.060 INFO (SyncWorker_3) [supervisor.backups.backup] Backup folder share done
2024-04-04 11:24:17.062 INFO (SyncWorker_6) [supervisor.backups.backup] Backing up folder addons/local
2024-04-04 11:24:17.065 INFO (SyncWorker_6) [supervisor.backups.backup] Backup folder addons/local done
2024-04-04 11:24:17.066 INFO (SyncWorker_4) [supervisor.backups.backup] Backing up folder ssl
2024-04-04 11:24:17.069 INFO (SyncWorker_4) [supervisor.backups.backup] Backup folder ssl done
2024-04-04 11:24:17.070 INFO (SyncWorker_2) [supervisor.backups.backup] Backing up folder media
2024-04-04 11:24:17.072 INFO (SyncWorker_2) [supervisor.backups.backup] Backup folder media done
2024-04-04 11:24:17.073 INFO (MainThread) [supervisor.backups.manager] Backup 8e7bb872 starting stage finishing_file
2024-04-04 11:24:17.076 INFO (MainThread) [supervisor.backups.manager] Creating full backup with slug 8e7bb872 completed
2024-04-04 11:24:17.082 INFO (MainThread) [supervisor.backups.manager] Found 46 backup files
2024-04-04 11:24:39.304 INFO (MainThread) [supervisor.backups.manager] Found 46 backup files
2024-04-04 11:24:40.099 INFO (MainThread) [supervisor.updater] Fetching update data from https://version.home-assistant.io/stable.json
2024-04-04 11:24:50.068 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token
2024-04-04 11:26:17.992 INFO (MainThread) [supervisor.api.middleware.security] /backups access from de91e161_hassio_onedrive_backup
2024-04-04 11:26:18.687 INFO (MainThread) [supervisor.api.middleware.security] /backups access from de91e161_hassio_onedrive_backup
2024-04-04 11:28:18.460 WARNING (MainThread) [supervisor.addons.options] Unknown option 'serial' for Zigbee2MQTT (45df7312_zigbee2mqtt)
2024-04-04 11:28:18.460 WARNING (MainThread) [supervisor.addons.options] Unknown option 'advanced' for Zigbee2MQTT (45df7312_zigbee2mqtt)
2024-04-04 11:31:18.490 INFO (MainThread) [supervisor.api.middleware.security] /backups access from de91e161_hassio_onedrive_backup
2024-04-04 11:31:18.955 INFO (MainThread) [supervisor.api.middleware.security] /backups access from de91e161_hassio_onedrive_backup

Anything in the add-on logs that might be useful for us?

s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service whisper: starting
s6-rc: info: service whisper successfully started
s6-rc: info: service discovery: starting
[11:15:30] WARNING: Your CPU does not support Advanced Vector Extensions (AVX). Whisper will run slower than normal.
INFO:__main__:Ready
[11:15:34] INFO: Successfully send discovery information to Home Assistant.
s6-rc: info: service discovery successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started

Additional information

No response

PengBG commented 2 months ago

I can confirm this behavior that Whisper Addon is fully backed up and backup file is huge, every model that I tried is there

carldebilly commented 2 months ago

Yep, models should not be saved into the /data folder, because it's part of the backups, which is not useful.

donburch888 commented 1 month ago

Bump. My Full Backups also increased by 2.8GB, which means I now have to delete backups frequently to keep in my 32GB disk partition. My system is currently:

HA OS in a proxmox partition on a x86-64 PC Core 2024.5.3 Supervisor 2024.05.1 Operating System 12.3 Frontend 20240501.1

I created a Full Backup (System > Backups > Create backup > Full Backup) taking 3840.1MB. Download to my PC and open Screenshot from 2024-05-13 20-43-03 Note the last entry … core_whisper.tar.gz 2.8GB out of 3.8GB file; and that that file contains the “medium.en” whisper model I am currently using, plus tiny folders for previously used models.

dfries commented 2 weeks ago

I see the original post list the documentation says it should be excluded from backup, but it isn't getting excluded.

https://bford.info/cachedir/ is a standard that some backup program follow that CACHEDIR.TAG with a specific first line will trigger the directory to be excluded from backup.

That's a thought for a mechanism for Whiper to exclude the model files if the Home Assistant backup were configured to look for these files to exclude directories. This could also allow a user to specify that they do want the models to be backed up if say they were moving to a different system and didn't want to redownload the models. Just a thought, it would all take logic to make happen.