k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker recognition, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.11k stars 359 forks source link

FYI Sherpa-onnx in TTS-Wrapper #1169

Open willwade opened 1 month ago

willwade commented 1 month ago

Just a quick FYI..

I've VERY QUICKLY (so bugs beware) added sherpa-onnx to this python tts-wrapper..

https://github.com/willwade/tts-wrapper?tab=readme-ov-file#sherpa-onnx

We do fun things like listing available voices, auto downloading models and running them.. then our abstract class deals with playing, pausing, streaming audio etc. Many aspects like pitch, volume control and SSML we obviously cant deal with sherpa-onnx. Or word events.. But hey..

I'm really just focusing on MMS models right now.. I'd love a lovely JSON file of all TTS models we could use if anyone has it and I can add them.

Needs a heap loads of testing and very welcoming to PR's. Particularly rewriting automated tests and checking/improving audio playback.. I feel there is a lag somewhere and I cant figure out where..

Use like

pip install "tts-wrapper[sherpaonnx] @ git+https://github.com/willwade/tts-wrapper"

then


from tts_wrapper import SherpaOnnxClient, SherpaOnnxTTS
try:
    client = SherpaOnnxClient(model_path=None, tokens_path=None)
    # or 
    # client = SherpaOnnxClient(model_path=None, tokens_path=None, voice_id="eng")
    # where voice_id is a id from the get_voices - typically an iso code from mms
    # Initialize the TTS engine
    tts = SherpaOnnxTTS(client)

    # Get available voices
    voices = tts.get_voices()
    print("Available voices:", voices)

    # Set the voice using ISO code
    iso_code = "eng"  # Example ISO code for the voice
    tts.set_voice(iso_code)

    # Define the text to be synthesized
    text = "Hello, This is a word timing test"
    start_time = time.time()
    tts.speak(text)

except Exception as e:
    print(f"Error: {e}")

Take a look at play/pause/resume etc..

https://github.com/willwade/tts-wrapper?tab=readme-ov-file#streaming-and-playback-control

willwade commented 1 month ago

I'm trying to support all other models.. But to do that I really could do with a nice formatted list of all models available.. So I'm trying this..


import requests
import json
import re

def get_github_release_assets(repo, tag):
    headers = {'Accept': 'application/vnd.github.v3+json'}

    # Get the release ID for the specified tag
    releases_url = f"https://api.github.com/repos/{repo}/releases/tags/{tag}"
    response = requests.get(releases_url, headers=headers)

    if response.status_code != 200:
        raise Exception(f"Failed to fetch release info for tag: {tag}")

    release_info = response.json()

    # Get the assets
    assets = []
    for asset in release_info.get('assets', []):
        filename = asset['name']
        asset_url = asset['browser_download_url']

        # Remove the file extension for further processing
        filename_no_ext = re.sub(r'\.tar\.bz2|\.tar\.gz|\.zip', '', filename)
        parts = filename_no_ext.split('-')

        model_type = 'vits' if parts[0] == 'vits' else 'unknown'
        developer = parts[1] if len(parts) > 1 else 'unknown'

        if developer == 'zh':
            lang_code = 'zh'
            developer = parts[2] if len(parts) > 2 else 'unknown'
            name = parts[3] if len(parts) > 3 else 'unknown'
            quality = parts[4] if len(parts) > 4 else 'unknown'
        else:
            lang_code = parts[2].replace('_', '-') if len(parts) > 2 else 'unknown'
            name = parts[3] if len(parts) > 3 else 'unknown'
            quality = parts[4] if len(parts) > 4 else 'unknown'

        if developer == 'zh':
            name = parts[3] if len(parts) > 3 else 'unknown'
            quality = 'unknown'
        else:
            lang_code = parts[2].replace('_', '-') if len(parts) > 2 else 'unknown'
            name = parts[3] if len(parts) > 3 else 'unknown'
            quality = parts[4] if len(parts) > 4 else 'unknown'

        if len(parts) == 5:
            quality = parts[-1]

        if developer == 'zh' and len(parts) > 3:
            lang_code = 'zh'
            developer = parts[2]
            name = parts[3]
            quality = 'unknown'

        if len(parts) == 4 and developer == 'vctk':
            lang_code = 'unknown'
            name = 'unknown'
            quality = 'unknown'

        # Determine if the asset is compressed
        compression = filename.endswith(('.tar.bz2', '.tar.gz', '.zip'))

        # Add asset info to the list
        assets.append({
            'model_type': model_type,
            'developer': developer,
            'language_code': lang_code,
            'name': name,
            'quality': quality,
            'url': asset_url,
            'compression': compression
        })

    # Convert the list of assets to JSON
    assets_json = json.dumps(assets, indent=4)

    return assets_json

# Example usage
repo = "k2-fsa/sherpa-onnx"
tag = "tts-models"
assets_json = get_github_release_assets(repo, tag)
print(assets_json)

But its not great.. eg


{
        "model_type": "vits",
        "developer": "mimic3",
        "language_code": "gu-IN",
        "name": "cmu",
        "quality": "indic_low",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-mimic3-gu_IN-cmu-indic_low.tar.bz2",
        "compression": true
    },
{
        "model_type": "vits",
        "developer": "piper",
        "language_code": "de-DE",
        "name": "ramona",
        "quality": "low",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-de_DE-ramona-low.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "piper",
        "language_code": "de-DE",
        "name": "thorsten",
        "quality": "high",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-de_DE-thorsten-high.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "piper",
        "language_code": "de-DE",
        "name": "thorsten",
        "quality": "low",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-de_DE-thorsten-low.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "piper",
        "language_code": "de-DE",
        "name": "thorsten",
        "quality": "medium",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-de_DE-thorsten-medium.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "piper",
        "language_code": "de-DE",
        "name": "thorsten_emotional",
        "quality": "medium",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-de_DE-thorsten_emotional-medium.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "piper",
        "language_code": "el-GR",
        "name": "rapunzelina",
        "quality": "low",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-el_GR-rapunzelina-low.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "piper",
        "language_code": "en-GB",
        "name": "alan",
        "quality": "low",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-en_GB-alan-low.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "piper",
        "language_code": "en-GB",
        "name": "alan",
        "quality": "medium",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-en_GB-alan-medium.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "hf",
        "language_code": "hf",
        "name": "keqing",
        "quality": "unknown",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-zh-hf-keqing.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "hf",
        "language_code": "hf",
        "name": "theresa",
        "quality": "unknown",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-zh-hf-theresa.tar.bz2",
        "compression": true
    },
    {
        "model_type": "vits",
        "developer": "hf",
        "language_code": "hf",
        "name": "zenyatta",
        "quality": "unknown",
        "url": "https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-zh-hf-zenyatta.tar.bz2",
        "compression": true
    }

any advice welcome on how those tts releases are formatted :)

csukuangfj commented 1 month ago

any advice welcome on how those tts releases are formatted :)

I suggest that you handle models like