k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.7k stars 430 forks source link

Questions related to MeloTTS #1193

Open eehoeskrap opened 4 months ago

eehoeskrap commented 4 months ago

Thank you for creating a great repository. I wonder why there is no bert when converting a pytorch model of MeloTTS to an Onnx model. https://github.com/k2-fsa/sherpa-onnx/blob/963aaba82b01a425ae8dcf0fdcff6b073a45686f/scripts/melo-tts/export-onnx.py#L206C1-L235C6

    torch.onnx.export(
        torch_model,
        (
            x,
            x_lengths,
            tones,
            sid,
            noise_scale,
            length_scale,
            noise_scale_w,
        ),
        filename,
        opset_version=opset_version,
        input_names=[
            "x",
            "x_lengths",
            "tones",
            "sid",
            "noise_scale",
            "length_scale",
            "noise_scale_w",
        ],
        output_names=["y"],
        dynamic_axes={
            "x": {0: "N", 1: "L"},
            "x_lengths": {0: "N"},
            "tones": {0: "N", 1: "L"},
            "y": {0: "N", 1: "S", 2: "T"},
        },
    )
csukuangfj commented 3 weeks ago

please adapt our current script. if you have any troubles, please post error logs.

I already tried that above and it didn't work .

Please see https://github.com/k2-fsa/sherpa-onnx/pull/1509

and please find why it didn't work for you. @nanaghartey

nanaghartey commented 3 weeks ago

please adapt our current script. if you have any troubles, please post error logs.

I already tried that above and it didn't work .

Please see #1509

and please find why it didn't work for you. @nanaghartey

Thanks for this. I tried it out with the model you shared - https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-melo-tts-en.tar.bz2

I have this:

modelName = "model.onnx";
        dictDir = "model/dict";
        lexicon  = "lexicon.txt";
        dataDir = null;

I use the same dict for Chinese + English model since i don't have any other. I get this when i run the app :

Current model is not using jieba but you provided --vits-dict-dir

the app hangs during start up with the logs below:


2024-11-05 18:48:37.139 13801-13801 sherpa-onnx             com.k2fsa.sherpa.onnx                W  ---vits model---
                                                                                                    description=MeloTTS is a high-quality multi-lingual text-to-speech library by MyShell.ai
                                                                                                    license=MIT license
                                                                                                    url=https://github.com/myshell-ai/MeloTTS
                                                                                                    tone_start=7
                                                                                                    speaker_id=0
                                                                                                    ja_bert_dim=768
                                                                                                    version=2
                                                                                                    bert_dim=1024
                                                                                                    add_blank=1
                                                                                                    sample_rate=44100
                                                                                                    n_speakers=4
                                                                                                    comment=melo
                                                                                                    lang_id=2
                                                                                                    language=English
                                                                                                    jieba=0
                                                                                                    model_type=melo-vits
                                                                                                    ----------input names----------
                                                                                                    0 x
                                                                                                    1 x_lengths
                                                                                                    2 tones
                                                                                                    3 sid
                                                                                                    4 noise_scale
                                                                                                    5 length_scale
                                                                                                    6 noise_scale_w
                                                                                                    ----------output names----------
                                                                                                    0 y
2024-11-05 18:48:37.194 13801-13801 sherpa-onnx             com.k2fsa.sherpa.onnx                W  Current model is not using jieba but you provided --vits-dict-dir
---------------------------- PROCESS ENDED (13801) for package com.k2fsa.sherpa.onnx ----------------------------  
    the chinese + english model runs fine with these logs:
--vits model---
                                                                                                    description=MeloTTS is a high-quality multi-lingual text-to-speech library by MyShell.ai
                                                                                                    license=MIT license
                                                                                                    url=https://github.com/myshell-ai/MeloTTS
                                                                                                    tone_start=0
                                                                                                    speaker_id=1
                                                                                                    ja_bert_dim=768
                                                                                                    version=2
                                                                                                    bert_dim=1024
                                                                                                    add_blank=1
                                                                                                    sample_rate=44100
                                                                                                    n_speakers=1
                                                                                                    comment=melo
                                                                                                    lang_id=3
                                                                                                    language=Chinese + English
                                                                                                    jieba=1
                                                                                                    model_type=melo-vits
                                                                                                    ----------input names----------
                                                                                                    0 x
                                                                                                    1 x_lengths
                                                                                                    2 tones
                                                                                                    3 sid
                                                                                                    4 noise_scale
                                                                                                    5 length_scale
                                                                                                    6 noise_scale_w
                                                                                                    ----------output names----------
                                                                                                    0 y     
csukuangfj commented 3 weeks ago

please don't use files not included in the model directory you have downloaded.

that is, do not use dict dir.

csukuangfj commented 3 weeks ago

All you need has included in the model tar.bz2 file.

Please see the comment in #1509 for usage

nanaghartey commented 3 weeks ago

@csukuangfj I forgot to mention i already tried that.

modelName = "model.onnx";
        dictDir = null;
        lexicon  = "lexicon.txt";
        dataDir = null;
        String meloDir = copyDataDir(modelDir);
        modelDir = meloDir + "/" + modelDir;
        assets = null;

the app loads all right but when i enter text and tap generate i get this.

024-11-05 21:57:56.546 12497-12615 sherpa-onnx             com.k2fsa.sherpa.onnx                W  string is: hello
2024-11-05 21:57:56.546 12497-12615 sherpa-onnx             com.k2fsa.sherpa.onnx                W  Raw text: hello
2024-11-05 21:57:56.547 12497-12615 libc++abi               com.k2fsa.sherpa.onnx                E  terminating due to uncaught exception of type std::out_of_range: unordered_map::at: key not found
2024-11-05 21:57:56.547 12497-12615 libc                    com.k2fsa.sherpa.onnx                A  Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 12615 (Thread-2), pid 12497 (fsa.sherpa.onnx)
2024-11-05 21:57:56.830 12619-12619 DEBUG                   pid-12619                            A  Cmdline: com.k2fsa.sherpa.onnx
2024-11-05 21:57:56.830 12619-12619 DEBUG                   pid-12619                            A  pid: 12497, tid: 12615, name: Thread-2  >>> com.k2fsa.sherpa.onnx <<<
2024-11-05 21:57:56.830 12619-12619 DEBUG                   pid-12619                            A        #01 pc 00000000001a8c90  /data/app/~~0iYprevsJ7urxYbxDG34mA==/com.k2fsa.sherpa.onnx-qDiQQovqO4De8UthMivj6w==/lib/arm64/libsherpa-onnx-jni.so (BuildId: 73eb9682daf1bd7954ab5281d845c6771d228f77)
2024-11-05 21:57:56.830 12619-12619 DEBUG                   pid-12619                            A        #02 pc 00000000001a8588  /data/app/~~0iYprevsJ7urxYbxDG34mA==/com.k2fsa.sherpa.onnx-qDiQQovqO4De8UthMivj6w==/lib/arm64/libsherpa-onnx-jni.so (BuildId: 73eb9682daf1bd7954ab5281d845c6771d228f77)
2024-11-05 21:57:56.830 12619-12619 DEBUG                   pid-12619                            A        #03 pc 00000000001a8448  /data/app/~~0iYprevsJ7urxYbxDG34mA==/com.k2fsa.sherpa.onnx-qDiQQovqO4De8UthMivj6w==/lib/arm64/libsherpa-onnx-jni.so (BuildId: 73eb9682daf1bd7954ab5281d845c6771d228f77)
2024-11-05 21:57:56.830 12619-12619 DEBUG                   pid-12619                            A        #04 pc 00000000001c3328  /data/app/~~0iYprevsJ7urxYbxDG34mA==/com.k2fsa.sherpa.onnx-qDiQQovqO4De8UthMivj6w==/lib/arm64/libsherpa-onnx-jni.so (BuildId: 73eb9682daf1bd7954ab5281d845c6771d228f77)
2024-11-05 21:57:56.830 12619-12619 DEBUG                   pid-12619                            A        #05 pc 00000000001c329c  /data/app/~~0iYprevsJ7urxYbxDG34mA==/com.k2fsa.sherpa.onnx-qDiQQovqO4De8UthMivj6w==/lib/arm64/libsherpa-onnx-jni.so (__cxa_throw+128) (BuildId: 73eb9682daf1bd7954ab5281d845c6771d228f77)
2024-11-05 21:57:56.830 12619-12619 DEBUG                   pid-12619                            A        #06 pc 00000000001ef8d4  /data/app/~~0iYprevsJ7urxYbxDG34mA==/com.k2fsa.sherpa.onnx-qDiQQovqO4De8UthMivj6w==/lib/arm64/libsherpa-onnx-jni.so (BuildId: 73eb9682daf1bd7954ab5281d845c6771d228f77)
2024-11-05 21:57:56.830 12619-12619 DEBUG                   pid-12619                            A        #07 pc 00000000002e2dcc  /data/app/~~0iYprevsJ7urxYbxDG34mA==/com.k2fsa.sherpa.onnx-qDiQQovqO4De8UthMivj6w==/lib/arm64/libsherpa-onnx-jni.so (BuildId: 73eb9682daf1bd7954ab5281d845c6771d228f77)
2024-11-05 21:57:56.830 12619-12619 DEBUG                   pid-12619                            A        #08 pc 00000000002c16a0  /data/app/~~0iYprevsJ7urxYbxDG34mA==/com.k2fsa.sherpa.onnx-qDiQQovqO4De8UthMivj6w==/lib/arm64/libsherpa-onnx-jni.so (BuildId: 73eb9682daf1bd7954ab5281d845c6771d228f77)
2024-11-05 21:57:56.830 12619-12619 DEBUG                   pid-12619                            A        #09 pc 00000000001d64f8  /data/app/~~0iYprevsJ7urxYbxDG34mA==/com.k2fsa.sherpa.onnx-qDiQQovqO4De8UthMivj6w==/lib/arm64/libsherpa-onnx-jni.so (Java_com_k2fsa_sherpa_onnx_OfflineTts_generateWithCallbackImpl+552) (BuildId: 73eb9682daf1bd7954ab5281d845c6771d228f77)
2024-11-05 21:57:56.831 12619-12619 DEBUG                   pid-12619                            A        #16 pc 000000000000390c  [anon:dalvik-classes3.dex extracted in memory from /data/app/~~0iYprevsJ7urxYbxDG34mA==/com.k2fsa.sherpa.onnx-qDiQQovqO4De8UthMivj6w==/base.apk] (com.k2fsa.sherpa.onnx.OfflineTts.generateWithCallback+0)
2024-11-05 21:57:56.831 12619-12619 DEBUG                   pid-12619                            A        #21 pc 0000000000003b64  [anon:dalvik-classes3.dex extracted in memory from /data/app/~~0iYprevsJ7urxYbxDG34mA==/com.k2fsa.sherpa.onnx-qDiQQovqO4De8UthMivj6w==/base.apk] (com.k2fsa.sherpa.onnx.Tts.generateWithCallback+0)
2024-11-05 21:57:56.831 12619-12619 DEBUG                   pid-12619                            A        #26 pc 00000000000025dc  [anon:dalvik-classes3.dex extracted in memory from /data/app/~~0iYprevsJ7urxYbxDG34mA==/com.k2fsa.sherpa.onnx-qDiQQovqO4De8UthMivj6w==/base.apk] (com.k2fsa.sherpa.onnx.MainActivityBatches.lambda$onClickGenerate$6$com-k2fsa-sherpa-onnx-MainActivityBatches+0)
2024-11-05 21:57:56.831 12619-12619 DEBUG                   pid-12619                            A        #31 pc 0000000000001ce4  [anon:dalvik-classes3.dex extracted in memory from /data/app/~~0iYprevsJ7urxYbxDG34mA==/com.k2fsa.sherpa.onnx-qDiQQovqO4De8UthMivj6w==/base.apk] (com.k2fsa.sherpa.onnx.MainActivityBatches$$ExternalSyntheticLambda3.run+0)
---------------------------- PROCESS ENDED (12497) for package com.k2fsa.sherpa.onnx ----------------------------
csukuangfj commented 3 weeks ago

are you using the latest master to build the libraries?

How did you get the '.so' files?

nanaghartey commented 3 weeks ago

for quick testing, I used the .so files in sherpa-onnx-1.10.30-arm64-v8a-zh_en-tts-vits-melo-tts-zh_en.apk from https://k2-fsa.github.io/sherpa/onnx/tts/apk.html

I just built the .so files and tested. Works now! Thanks a lot. By the way in the export-onnx-en script i only changed :

def main():
    generate_lexicon()

    language = "EN"
    model = TTS(language=language, device="cpu")

To

def main():
    generate_lexicon()

    model_path = "model.pth"  # Path to your custom model
    config_path = "config.json"  # Path to your config.json file
    with open(config_path, 'r') as f:
        config = json.load(f)

    model = TTS(language="EN", device="cpu", config_path=config_path, ckpt_path=model_path)
    model.load_state_dict(torch.load(model_path, map_location="cpu"), strict=False)

That should be enough right? It works but wondering if i need to change something else to improve pronunciation

csukuangfj commented 3 weeks ago

if i need to change something else to improve pronunciation

You can try to enable bert support.

csukuangfj commented 3 weeks ago

for quick testing, I used the .so files in sherpa-onnx-1.10.30-arm64-v8a-zh_en-tts-vits-melo-tts-zh_en.apk from https://k2-fsa.github.io/sherpa/onnx/tts/apk.html

I hope you understand that support for melo-tts English model is added after 1.10.30 and you need to use the latest master to test it, not the code or library from 1.10.30.

nanaghartey commented 3 weeks ago

if i need to change something else to improve pronunciation

You can try to enable bert support.

sure i'll try that. Thanks