Consider automatic download of voice models

Silero_tts extension shows an example on how to automatically download a requested model:

https://github.com/oobabooga/text-generation-webui/blob/8ea3f316012e6befe6a852501ce158a478c8e680/extensions/silero_tts/script.py#L53C1-L63C17

def load_model():
    torch_cache_path = torch.hub.get_dir() if params['local_cache_path'] == '' else params['local_cache_path']
    model_path = torch_cache_path + "/snakers4_silero-models_master/src/silero/model/" + params['model_id'] + ".pt"
    if Path(model_path).is_file():
        print(f'\nUsing Silero TTS cached checkpoint found at {torch_cache_path}')
        model, example_text = torch.hub.load(repo_or_dir=torch_cache_path + '/snakers4_silero-models_master/', model='silero_tts', language=languages[params['language']]["lang_id"], speaker=params['model_id'], source='local', path=model_path, force_reload=True)
    else:
        print(f'\nSilero TTS cache not found at {torch_cache_path}. Attempting to download...')
        model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_tts', language=languages[params['language']]["lang_id"], speaker=params['model_id'])
    model.to(params['device'])
    return model

Also, the list of available voices could be offered. Here is a bash example that retrieves the information for piper (reference):

BASE_PIPER_VOICES_URL=${BASE_PIPER_VOICES_URL:-https://huggingface.co/rhasspy/piper-voices/resolve/main}
#[...]
VOICE_JSON=$(mktemp)
curl -s -L -o "$VOICE_JSON" "$BASE_PIPER_VOICES_URL/voices.json"
VOICES=$(jq -r 'map(.key) | @sh' < "$VOICE_JSON")
rm "$VOICE_JSON"

The list of available voices should combine those already in the cache directory (which could include personalized models) and those available for download. Ideally, they could be shown with some markings to distinguish whether they are already downloaded. The location of the cache directory should be configurable.

I will not be advancing this issue soon, thus contributions are welcome.

tijo95 / piper_tts

Consider automatic download of voice models #8