mkiol / dsnote

Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
Mozilla Public License 2.0
402 stars 19 forks source link

Unable to add Custom TTS model (i.e Coqui TTS) #123

Closed ghost closed 2 weeks ago

ghost commented 3 months ago

I was unable to add Custom TTS (i.e Coqui TTS). Tried to add model information in model.json but it doesn't seems to work, maybe I am doing it wrong. What is the procedure to add Custom TTS model in Speech Note application. Thanks for making this great app for Linux :)

mkiol commented 3 months ago

Hi. Thanks for the report.

As you probably know, you need to edit ~/.var/app/net.mkiol.SpeechNote/data/net.mkiol/dsnote/models.json file and add new entry with model configuration.

This entry should be similar to the one below.

        {
            "name": "New cool voice",
            "model_id": "en_coqui_new_cool_model",
            "engine": "tts_coqui",
            "lang_id": "en",
            "checksum": "8bc7e85b",
            "checksum_quick": "50984d2b",
            "comp": "dir",
            "urls": [
                "file:///path/to/model/config.json",
                "file:///path/to/model/model.pth"
            ],
            "size": "100827994"
        },

Few important remarks:

flatpak run net.mkiol.SpeechNote --verbose --gen-checksums

The model will be downloaded automatically and the checksum should appear on the terminal.

[D] 18:15:52.802230735.802 0x7709dea87d00 () - all checksums were generated
models checksums:

"model_id": "fr_coqui_css100_vits",
"checksum": "a7671b81",
"checksum_quick": "7d7531cf",
"size": "100821187",

Let me know if any of this was helpful.

ghost commented 3 months ago

Thanks, this did work but what about adding a custom multi-language model i.e fine tuned XTTS model on it? Do I have to add multiple model ids for different language the XTTS model supports?

mkiol commented 3 months ago

XTTS? Nice :)

custom multi-language model

For multilingual models you may use "model aliases". Alias is a copy of the model entry but with changed properties (like language for instance). To create alias, define new model entry with model_alias_of param. Look at the example below.

Model multilang_coqui_xtts203 is a base model. It is hidden for the user thanks to hidden : true. This "base" model is used by en_coqui_xtts203 and pt_coqui_br_xtts203 aliases.

        {
            "name": "Multilingual (Coqui XTTS-v2.0.3)",
            "model_id": "multilang_coqui_xtts203",
            "engine": "tts_coqui",
            "lang_id": "multilang",
            "checksum": "ae3c9981",
            "checksum_quick": "ce376c5d",
            "options": "xs",
            "features": [
                "tts_voice_cloning"
            ],
            "license": {
                "id": "CPML",
                "name": "Coqui Public Model License 1.0.0",
                "url": "https://coqui.ai/cpml.txt",
                "accept_required": true
            },
            "comp": "dir",
            "urls": [
                "https://huggingface.co/coqui/XTTS-v2/resolve/69d4f754575c4b72d991f105b4775d270438ef33/model.pth",
                "https://huggingface.co/coqui/XTTS-v2/resolve/69d4f754575c4b72d991f105b4775d270438ef33/config.json",
                "https://huggingface.co/coqui/XTTS-v2/resolve/69d4f754575c4b72d991f105b4775d270438ef33/vocab.json"
            ],
            "size": "1868302897",
            "hidden": true
        },
        {
            "name": "English (Coqui XTTS-v2.0.3)",
            "model_id": "en_coqui_xtts203",
            "model_alias_of": "multilang_coqui_xtts203",
            "lang_id": "en"
        },
        {
            "name": "Português brasileiro (Coqui XTTS-v2.0.3)",
            "model_id": "pt_coqui_br_xtts203",
            "model_alias_of": "multilang_coqui_xtts203",
            "lang_id": "pt"
        },