FYI: Download links about Android APKs for piper models

csukuangfj commented 7 months ago

Now you can try piper models on your Android phones.

The following languages are supported:

English
French
Spanish
German

Community help is appreciated to convert more models from piper to sherpa-onnx.

Please see https://k2-fsa.github.io/sherpa/onnx/tts/apk.html

Note:

You can try the models in your browser by visiting the following huggingface space https://huggingface.co/spaces/k2-fsa/text-to-speech

beqabeqa473 commented 7 months ago

@csukuangfj thanks for your work.

To evaluate models it would be great to implement android tts engine api.

I would do it myself after i finish all work-related things, but i am saying in case someone could do it before.

anita-smith1 commented 6 months ago

@csukuangfj Sorry for my noob question. When I convert a fine-tuned coqui tts vits-ljs model to onnx using the code here, i only got the onnx export. The android code expects tokens.txt and lexicon. How can i get those too? and will this work with coqui tts models?

csukuangfj commented 6 months ago

@anita-smith1

Can you find the code about how to convert a word to phonemes for vits models from coqui?

If you can provide that, I can provide scripts to generate lexicon.txt and tokens.txt and also model.onnx from coqui.

csukuangfj commented 6 months ago

@anita-smith1

I just managed to convert vits models from coqui to sherpa-onnx.

Will post a colab notebook to show you how to do that.

You can use the converted model in sherpa-onnx, e.g., build an Android App with sherpa-onnx.

csukuangfj commented 6 months ago

@anita-smith1

@nanaghartey

I just created a colab notebook to show how to export the VITS models from https://github.com/coqui-ai/TTS to onnx and how to generate tokens.txt and lexicon.txt so that you can use the exported model with sherpa-onnx.

Please see https://colab.research.google.com/drive/1cI9VzlimS51uAw4uCR-OBeSXRPBc4KoK?usp=sharing

csukuangfj commented 6 months ago

For those of you who are interested in converting piper models to sherpa-onnx, please have a look at the following colab notebook:

https://colab.research.google.com/drive/1PScLJV3sbUUAOiptLO7Ixlzh9XnWWoYZ?usp=sharing

anita-smith1 commented 6 months ago

@csukuangfj Thanks for sharing how to export. I run your colab notebook without using my model. Everything is generated but when i use it in the sherpa-onnx tts android demo app, it crashes. The app loads the model, lexicon and tokens all right but when you tap "Generate" the app crashes with:

2023-11-10 17:39:10.825 18430-18430 sherpa-onnx             com.k2fsa.sherpa.onnx                W  string is: how are you doing
2023-11-10 17:39:10.825 18430-18430 sherpa-onnx             com.k2fsa.sherpa.onnx                W  Raw text: how are you doing
2023-11-10 17:39:10.826 18430-18430 libc                    com.k2fsa.sherpa.onnx                A  Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x38333501 in tid 18430 (fsa.sherpa.onnx), pid 18430 (fsa.sherpa.onnx)
2023-11-10 17:39:11.148 18526-18526 DEBUG                   pid-18526                            A  Cmdline: com.k2fsa.sherpa.onnx
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A  pid: 18430, tid: 18430, name: fsa.sherpa.onnx  >>> com.k2fsa.sherpa.onnx <<<
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #00 pc 00000000003d65f8  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libonnxruntime.so (BuildId: 5bc662f139575789b099c8c4823a8ed1565a0d4c)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #01 pc 00000000003a9404  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libonnxruntime.so (BuildId: 5bc662f139575789b099c8c4823a8ed1565a0d4c)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #02 pc 000000000009f914  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libsherpa-onnx-core.so (Ort::detail::SessionImpl<OrtSession>::Run(Ort::RunOptions const&, char const* const*, Ort::Value const*, unsigned long, char const* const*, unsigned long)+204) (BuildId: c9b089160ce1cb938a4794e9c94d3a1636656712)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #03 pc 0000000000124ba4  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libsherpa-onnx-core.so (sherpa_onnx::OfflineTtsVitsModel::Impl::RunVits(Ort::Value, long, float)+1128) (BuildId: c9b089160ce1cb938a4794e9c94d3a1636656712)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #04 pc 0000000000123f9c  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libsherpa-onnx-core.so (sherpa_onnx::OfflineTtsVitsModel::Impl::Run(Ort::Value, long, float)+96) (BuildId: c9b089160ce1cb938a4794e9c94d3a1636656712)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #05 pc 0000000000123ec0  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libsherpa-onnx-core.so (sherpa_onnx::OfflineTtsVitsModel::Run(Ort::Value, long, float)+48) (BuildId: c9b089160ce1cb938a4794e9c94d3a1636656712)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #06 pc 00000000001220a4  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libsherpa-onnx-core.so (sherpa_onnx::OfflineTtsVitsImpl::Generate(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, long, float) const+1128) (BuildId: c9b089160ce1cb938a4794e9c94d3a1636656712)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #07 pc 000000000000a1a8  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libsherpa-onnx-jni.so (Java_com_k2fsa_sherpa_onnx_OfflineTts_generateImpl+316) (BuildId: 7c35f6abaaa0600b2d69ed7bf0aa62b69c017daa)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #14 pc 0000000000002278  [anon:dalvik-classes3.dex extracted in memory from /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/base.apk!classes3.dex] (com.k2fsa.sherpa.onnx.OfflineTts.generate+0)
2023-11-10 17:39:11.150 18526-18526 DEBUG                   pid-18526                            A        #19 pc 00000000000012a0  [anon:dalvik-classes3.dex extracted in memory from /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/base.apk!classes3.dex] (com.k2fsa.sherpa.onnx.MainActivity.onClickGenerate+0)
2023-11-10 17:39:11.150 18526-18526 DEBUG                   pid-18526                            A        #24 pc 0000000000001544  [anon:dalvik-classes3.dex extracted in memory from /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/base.apk!classes3.dex] (com.k2fsa.sherpa.onnx.MainActivity.onCreate$lambda$0+0)
2023-11-10 17:39:11.150 18526-18526 DEBUG                   pid-18526                            A        #29 pc 0000000000001200  [anon:dalvik-classes3.dex extracted in memory from /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/base.apk!classes3.dex] (com.k2fsa.sherpa.onnx.MainActivity.$r8$lambda$OIkLpaHjEAmudVQGZZp-NNJ9rrA+0)
2023-11-10 17:39:11.150 18526-18526 DEBUG                   pid-18526                            A        #34 pc 00000000000011ac  [anon:dalvik-classes3.dex extracted in memory from /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/base.apk!classes3.dex] (com.k2fsa.sherpa.onnx.MainActivity$$ExternalSyntheticLambda0.onClick+0)
2023-11-10 17:39:11.150 18526-18526 DEBUG                   pid-18526                            A        #49 pc 00000000003074bc  [anon:dalvik-classes.dex extracted in memory from /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/base.apk] (com.google.android.material.button.MaterialButton.performClick+0)
---------------------------- PROCESS ENDED (18430) for package com.k2fsa.sherpa.onnx ----------------------------
2023-11-10 17:39:11.408  1190-1226  WindowManager           pid-1190                             E  win=Window{a478533 u0 com.k2fsa.sherpa.onnx/com.k2fsa.sherpa.onnx.MainActivity EXITING} destroySurfaces: appStopped=false cleanupOnResume=false win.mWindowRemovalAllowed=true win.mRemoveOnExit=true win.mViewVisibility=0 caller=com.android.server.wm.ActivityRecord.destroySurfaces:6536 com.android.server.wm.ActivityRecord.destroySurfaces:6517 com.android.server.wm.WindowState.onExitAnimationDone:5966 com.android.server.wm.ActivityRecord$$ExternalSyntheticLambda10.accept:2 java.util.ArrayList.forEach:1528 com.android.server.wm.ActivityRecord.onAnimationFinished:8605 com.android.server.wm.ActivityRecord.postApplyAnimation:6250

This is the model.onnx, lexicon.txt and token.txt generated from your notebook - https://drive.google.com/file/d/1ndZ5MSyS8482Eht6IP1jqxsMwIvB9Ln6/view?usp=sharing

When running your notebook, this was the only error i encountered but i'm not sure that matters:

%%shell

pip install -q TTS:

 Preparing metadata (setup.py) ... done
......
  Building wheel for encodec (setup.py) ... done
  Building wheel for umap-learn (setup.py) ... done
  Building wheel for bnnumerizer (setup.py) ... done
  Building wheel for bnunicodenormalizer (setup.py) ... done
  Building wheel for docopt (setup.py) ... done
  Building wheel for gruut-ipa (setup.py) ... done
  Building wheel for gruut_lang_de (setup.py) ... done
  Building wheel for gruut_lang_en (setup.py) ... done
  Building wheel for gruut_lang_es (setup.py) ... done
  Building wheel for gruut_lang_fr (setup.py) ... done
  Building wheel for pynndescent (setup.py) ... done
  Building wheel for gruut (setup.py) ... done
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires fastapi, which is not installed.
lida 0.0.10 requires kaleido, which is not installed.
lida 0.0.10 requires python-multipart, which is not installed.
lida 0.0.10 requires uvicorn, which is not installed.
plotnine 0.12.4 requires numpy>=1.23.0, but you have numpy 1.22.0 which is incompatible.
tensorflow 2.14.0 requires numpy>=1.23.5, but you have numpy 1.22.0 which is incompatible.

This android crash occurs when i export and use my fine-tuned coqui tts model too.

csukuangfj commented 6 months ago

The app loads the model, lexicon and tokens all right but when you tap "Generate" the app crashes with:

@anita-smith1

Could you show more logs, something like below:

13:48.972 19945-19945 sherpa-onnx             com.k2fsa.sherpa.onnx                I  Start to initialize TTS
2023-11-11 11:13:49.063 19945-19945 sherpa-onnx             com.k2fsa.sherpa.onnx                W  config:
                                                                                                    OfflineTtsConfig(model=OfflineTtsModelConfig(vits=OfflineTtsVitsModelConfig(model="vits-coqui-en-vctk/model.onnx", lexicon="vits-coqui-en-vctk/lexicon.txt", tokens="vits-coqui-en-vctk/tokens.txt", noise_scale=0.667, noise_scale_w=0.8, length_scale=1), num_threads=2, debug=True, provider="cpu"), rule_fsts="")
2023-11-11 11:14:07.089 19945-19945 sherpa-onnx             com.k2fsa.sherpa.onnx                W  ---vits model---
                                                                                                    punctuation=; : , . ! ? ¡ ¿ — … " « » “ ”  
                                                                                                    add_blank=1
                                                                                                    sample_rate=22050
                                                                                                    language=English
                                                                                                    n_speakers=109
                                                                                                    comment=coqui
                                                                                                    model_type=vits

I need to see the model meta data output.

csukuangfj commented 6 months ago

@csukuangfj Superr!

For converting piper models to sherpa-onnx using the provided colab notebook, i can confirm that it works on SherpaOnnxTTS

but for converting coqui tts vits model to sherpa onnx, using same android code, the android app crashes on all test devices

@nanaghartey

Could you post some logcat output for the crash?

csukuangfj commented 6 months ago

@anita-smith1

From you error log,


2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #01 pc 00000000003a9404  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libonnxruntime.so (BuildId: 5bc662f139575789b099c8c4823a8ed1565a0d4c)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #02 pc 000000000009f914  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libsherpa-onnx-core.so (Ort::detail::SessionImpl<OrtSession>::Run(Ort::RunOptions const&, char const* const*, Ort::Value const*, unsigned long, char const* const*, unsigned long)+204) (BuildId: c9b089160ce1cb938a4794e9c94d3a1636656712)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #03 pc 0000000000124ba4  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libsherpa-onnx-core.so (sherpa_onnx::OfflineTtsVitsModel::Impl::RunVits(Ort::Value, long, float)+1128) (BuildId: c9b089160ce1cb938a4794e9c94d3a1636656712)
2023-11-10 17:39:11.149 18526-18526 DEBUG                   pid-18526                            A        #04 pc 0000000000123f9c  /data/app/~~VjPnUWg1_JmIAZyAeNh2fA==/com.k2fsa.sherpa.onnx-kZTE0P4iaYzmG76-wTSUZQ==/lib/arm64/libsherpa-onnx-core.so (sherpa_onnx::OfflineTtsVitsModel::Impl::Run(Ort::Value, long, float)+96) (BuildId:

It crashes in the function

sherpa_onnx::OfflineTtsVitsModel::Impl::RunVits

which does not look right. I think there is something wrong in the comment field of your model meta data. It should be coqui or piper for models from coqui and piper.

anita-smith1 commented 6 months ago

@csukuangfj

The app loads the model, lexicon and tokens all right but when you tap "Generate" the app crashes with:

@anita-smith1

Could you show more logs, something like below:

13:48.972 19945-19945 sherpa-onnx             com.k2fsa.sherpa.onnx                I  Start to initialize TTS
2023-11-11 11:13:49.063 19945-19945 sherpa-onnx             com.k2fsa.sherpa.onnx                W  config:
                                                                                                    OfflineTtsConfig(model=OfflineTtsModelConfig(vits=OfflineTtsVitsModelConfig(model="vits-coqui-en-vctk/model.onnx", lexicon="vits-coqui-en-vctk/lexicon.txt", tokens="vits-coqui-en-vctk/tokens.txt", noise_scale=0.667, noise_scale_w=0.8, length_scale=1), num_threads=2, debug=True, provider="cpu"), rule_fsts="")
2023-11-11 11:14:07.089 19945-19945 sherpa-onnx             com.k2fsa.sherpa.onnx                W  ---vits model---
                                                                                                    punctuation=; : , . ! ? ¡ ¿ — … " « » “ ”  
                                                                                                    add_blank=1
                                                                                                    sample_rate=22050
                                                                                                    language=English
                                                                                                    n_speakers=109
                                                                                                    comment=coqui
                                                                                                    model_type=vits

I need to see the model meta data output.

This is my log. Comment field says coqui but when I compare to yours I can see the numofspeakers is 0 for mine:

2023-11-11 04:17:49.611 14902-14902 sherpa-onnx             com.k2fsa.sherpa.onnx                I  Start to initialize TTS
2023-11-11 04:17:49.701 14902-14902 sherpa-onnx             com.k2fsa.sherpa.onnx                W  config:
                                                                                                    OfflineTtsConfig(model=OfflineTtsModelConfig(vits=OfflineTtsVitsModelConfig(model="original_colab_model/model.onnx", lexicon="original_colab_model/lexicon.txt", tokens="original_colab_model/tokens.txt", noise_scale=0.667, noise_scale_w=0.8, length_scale=1), num_threads=2, debug=True, provider="cpu"), rule_fsts="")
2023-11-11 04:17:51.634 14902-14902 libc                    com.k2fsa.sherpa.onnx                W  Access denied finding property "ro.mediatek.platform"
2023-11-11 04:18:00.766 14902-14902 sherpa-onnx             com.k2fsa.sherpa.onnx                W  ---vits model---
                                                                                                    punctuation=; : , . ! ? ¡ ¿ — … " « » “ ”  
                                                                                                    add_blank=1
                                                                                                    sample_rate=22050
                                                                                                    language=English
                                                                                                    n_speakers=0
                                                                                                    comment=coqui
                                                                                                    model_type=vits
2023-11-11 04:18:06.575 14902-14902 sherpa-onnx             com.k2fsa.sherpa.onnx                I  Finish initializing TTS
2023-11-11 04:18:06.657 14902-14902 MSHandlerLifeCycle      com.k2fsa.sherpa.onnx                I  check: return. pkg=com.k2fsa.sherpa.onnx parent=null callers=com.android.internal.policy.DecorView.setVisibility:4411 android.app.ActivityThread.handleResumeActivity:5476 android.app.servertransaction.ResumeActivityItem.execute:54 android.app.servertransaction.ActivityTransactionItem.execute:45 android.app.servertransaction.TransactionExecutor.executeLifecycleState:176 
2023-11-11 04:18:06.657 14902-14902 MSHandlerLifeCycle      com.k2fsa.sherpa.onnx                I  removeMultiSplitHandler: no exist. decor=DecorView@c4a4467[]
2023-11-11 04:18:06.684 14902-14945 NativeCust...ncyManager com.k2fsa.sherpa.onnx                D  [NativeCFMS] BpCustomFrequencyManager::BpCustomFrequencyManager()
2023-11-11 04:18:06.716 14902-14902 InsetsController        com.k2fsa.sherpa.onnx                D  onStateChanged: InsetsState: {mDisplayFrame=Rect(0, 0 - 1920, 1200),

csukuangfj commented 6 months ago

@anita-smith1

Are you using the latest master of sherpa-onnx or the version >= v1.8.9?

anita-smith1 commented 6 months ago

@anita-smith1

Are you using the latest master of sherpa-onnx or the version >= v1.8.9?

I am using version 1.8.7. Same version used in the apks you originally shared

anita-smith1 commented 6 months ago

@anita-smith1

Are you using the latest master of sherpa-onnx or the version >= v1.8.9?

I just tried v1.8.9 and it worked :)

csukuangfj commented 6 months ago

@anita-smith1 Are you using the latest master of sherpa-onnx or the version >= v1.8.9?

I just tried v1.8.9 and it worked :)

Glad to hear that it works for you.

By the way, we have pre-built Android APKs for the VITS English models from Coqui.

Screenshot 2023-11-11 at 12 54 36

https://k2-fsa.github.io/sherpa/onnx/tts/apk.html

anita-smith1 commented 6 months ago

@anita-smith1 Are you using the latest master of sherpa-onnx or the version >= v1.8.9?

I just tried v1.8.9 and it worked :)

Glad to hear that it works for you.

By the way, we have pre-built Android APKs for the VITS English models from Coqui.

https://k2-fsa.github.io/sherpa/onnx/tts/apk.html

Thank you very much for your amazing support. It's incredible!

Now android demo works with my fine-tuned coqui models too. By the way, in case I want to use Java instead of kotlin for android, would I need to build from source? There seems to be only kotlin support for android at the moment.

Also are there plans to release same on-device tts for iOS too?

csukuangfj commented 6 months ago

By the way, in case I want to use Java instead of kotlin for android, would I need to build from source?

Sorry that we only have Kotlin for Android demo. But the jni interface can also be used in Java.

You can reuse the .so files for java.

You can build sherpa-onnx from source for Android by following https://k2-fsa.github.io/sherpa/onnx/android/index.html if you want to use the latest master of sherpa-onnx.

Also are there plans to release same on-device tts for iOS too?

Yes, it is in the plan. We have already supported speech-to-text on iOS. Adding text-to-speech to iOS is very easy with our current code in sherpa-onnx. We will do that in the coming week.

anita-smith1 commented 6 months ago

@csukuangfj That's great to hear!

I have another noob question:

I have a fine tuned coqui tts vits model that contains non-English words. I use a custom CMU.in.IPA.txt and custom all-english-words.txt file for the non-English words (with few English words). When I synthesise using the cell that contains this code:

    ......
def main():
  model = OnnxModel("./model.onnx")
  text = "xse wo atua de a fa"
  x = vits.tokenizer.text_to_ids(text, vits.tokenizer)
  x = torch.tensor(x, dtype=torch.int64)
  y = model(x)
  print(y.shape)
  soundfile.write("test.wav", y.numpy(), model.sample_rate)

main()

Everything works. The pronunciation is good. However when I use:

sherpa-onnx-offline-tts \
  --vits-model=./model.onnx \
  --vits-lexicon=./lexicon.txt \
  --vits-tokens=./tokens.txt \
  --output-filename=./test.wav \
  "xse wo atua de a fa"

I get unexpected results - the pronunciations are wrong. Even for the few English words.

This is a sample of my ipa.txt file which contains both non-english and few English words:

a,              ʌ
atua,           ejtujʌ
call,           kɔˈl
de,             ðʌ
din,            diˈn
edin,           ejdiˈn
fa,             fʌ
frq,            fɹɛ
line,           lajˈn
mma,            mɑˈ
mobile,         mowˈbʌl
na,             nɑˈ
naa,            nɑˈɑˈ
ndrope,         ɪŋdɹɑˌpi
ne,             nɪ
no,             nu
nreflecte,      ɪŋɹʌflɛˈktɪ
nxma,           nɔmbɚ

all-english-words.txt contains all the words. I followed the same format used in the original English list and I used a phoneme to IPA converter.

What do I need to do to make this work? Thank you

csukuangfj commented 6 months ago

I used a phoneme to IPA converter.

Please add your words to all-english-words.txt and let the code generate the pronunciations for you.

The pronunciations in CMU.in.IPA.txt are discarded and are never used. Only the first column, i.e., the words, is used in CMU.in.IPA.txt.

Please don't use an IPA converter to generate pronunciations for your new words.

The code uses get_phones in one cell from the colab to generate pronunciations for a given word.

anita-smith1 commented 6 months ago

@csukuangfj you are right! It's better now, though there is still some difference ( a slight loss in pronunciation compared to the original coqui model). Is the order of the words I add important? I added them at the bottom of all-english-words.txt file. Also, is anything else I can do to improve the pronunciations on sherpa-onnx?

rmcpantoja commented 6 months ago

Hi @csukuangfj, Why not make an Android port of piper_phonemize and use it in next gen TTS instead of a lexicon? These voices could be used in a screen reader in the future, and there will be many words will try to read that may not be in that lexicon.

csukuangfj commented 6 months ago

Why not make an Android port of piper_phonemize and use it in next gen TTS instead of a lexicon?

We try to support as many models as possible, including models not from piper, which may not use piper_phonemize.

there will be many words will try to read that may not be in that lexicon

We can add OOV words to the lexicon.txt manually. By the way, the lexicon.txt already covers lots of words.

csukuangfj commented 6 months ago

Is the order of the words I add important?

The order does not matter as long as you don't add duplicate words.

If you add duplicate words, then the first one has a higher priority and later ones are ignored.

is anything else I can do to improve the pronunciations on sherpa-onnx?

I realized that a word that appears in a sentence with other words can have a different pronunciation from when it appears standalone. I am afraid it is hard, if not impossible, to improve the pronunciations with the current approach.

beqabeqa473 commented 6 months ago

I am not sure you can cover everything.

It might be better to make a condition and use piper_phonemize for piper models.

Yes, this will result in adding espeak-ng, but it will be much better than adding words manually

On 11/14/23, Fangjun Kuang @.***> wrote:

Why not make an Android port of piper_phonemize and use it in next gen TTS instead of a lexicon?

We try to support as many models as possible, including models not from piper, which may not use piper_phonemize.

there will be many words will try to read that may not be in that lexicon

We can add OOV words to the lexicon.txt manually. By the way, the lexicon.txt already covers lots of words.

-- Reply to this email directly or view it on GitHub: https://github.com/rhasspy/piper/issues/257#issuecomment-1809946186 You are receiving this because you commented.

Message ID: @.***>

-- with best regards Beqa Gozalishvili Tell: +995593454005 Email: @.*** Web: https://gozaltech.org Skype: beqabeqa473 Telegram: https://t.me/gozaltech facebook: https://facebook.com/gozaltech twitter: https://twitter.com/beqabeqa473 Instagram: https://instagram.com/beqa.gozalishvili

csukuangfj commented 6 months ago

Maybe a better approach is to change the modeling unit from phones to other units that can be entirely derived from words.

anita-smith1 commented 6 months ago

@csukuangfj true, single words seem to have poor pronunciations compared to same words in phrases. However, the fact that this tts solution works offline is amazing. By the way can inference be done on the GPU? And importantly, has the iOS version been released?

csukuangfj commented 6 months ago

And importantly, has the iOS version been released

I am writing the code now. Please wait a day or two.

anita-smith1 commented 6 months ago

@csukuangfj wow can't wait :) your work is amazing

csukuangfj commented 6 months ago

@anita-smith1

The iOS demo is ready now.

You can run text-to-speech with Next-gen Kaldi on iOS using https://github.com/k2-fsa/sherpa-onnx/pull/443

I have recorded a video showing how to use it. Please see https://www.youtube.com/watch?v=MvePdkuMNJk

single words seem to have poor pronunciations compared to same words in phrases

Don't worry. I will try to fix it.

anita-smith1 commented 6 months ago

@csukuangfj incredible. that was quick! The video demo is fantastic. Great Job! but for me seems I'd have to wait since my project does not use swiftUI. It only uses uikit. I can see you have this to help us build the android from source. Can you add documentation for iOS too? so we can build our iOS too from source

csukuangfj commented 6 months ago

Please see https://k2-fsa.github.io/sherpa/onnx/ios/build-sherpa-onnx-swift.html#build-sherpa-onnx-in-commandline-c-part for doc.

anita-smith1 commented 6 months ago

Please see https://k2-fsa.github.io/sherpa/onnx/ios/build-sherpa-onnx-swift.html#build-sherpa-onnx-in-commandline-c-part for doc.

Thank you

sweetbbak commented 6 months ago

Is it possible to add my custom model? Here is the link: /ivona_hq.tar.lzma

If you want to check it out, it can be unpacked with:

tar --lzma --extract --file ivona_hq.tar.lzma

Ive been wanting to use this voice but the original application is 32bit and nearly 15 years old at this point. Its being held together by threads.

csukuangfj commented 6 months ago

is it a vits model? could you show the extracted files?

csukuangfj commented 6 months ago

Is it possible to add my custom model? Here is the link: /ivona_hq.tar.lzma

If you want to check it out, it can be unpacked with:
tar --lzma --extract --file ivona_hq.tar.lzma 
Ive been wanting to use this voice but the original application is 32bit and nearly 15 years old at this point. Its being held together by threads.

@sweetbbak

I just looked at the model and found that it is piper-based, so it is definitely supported by sherpa-onnx.

I have converted the model. You can find it at https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models

I have also added it to https://huggingface.co/spaces/k2-fsa/text-to-speech

Screenshot 2023-11-30 at 11 18 41

For the following text:

“Today as always, men fall into two 
groups: slaves and free men. Whoever does not 
have two-thirds of his day for himself, is a slave, whatever he 
may be: a statesman, a businessman, an official, or a scholar.”

It generates the following audio:

https://github.com/rhasspy/piper/assets/5284924/9a2d2869-5272-4eb0-8f28-c705fbee4c05

csukuangfj commented 6 months ago

By the way, you can also find the exported model at https://huggingface.co/csukuangfj/vits-piper-en_US-sweetbbak-amy/tree/main

The above repo also contains the script for exporting.

csukuangfj commented 6 months ago

@sweetbbak

The audio sounds like British English, but the JSON config file says the voice is en-us. Is there something wrong?

sweetbbak commented 6 months ago

Thank you so much! Thats my mistake, more than likely. I believe I used en_US-amy as a base to fine-tune off of because it sounded better. So I was unsure if it should be marked as en_US or en_GB. Its definitely a British voice.

csukuangfj commented 6 months ago

Thank you so much! Thats my mistake, more than likely. I believe I used en_US-amy as a base to fine-tune off of because it sounded better. So I was unsure if it should be marked as en_US or en_GB. Its definitely a British voice.

I am changing it to en_GB.

sweetbbak commented 6 months ago

I appreciate it.

csukuangfj commented 5 months ago

FYI:

No lexicon.txt is required any longer. We are also using piper-phonemize in sherpa-onnx.

You can find all models from piper in sherpa-onnx at https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models

LuekWasHere commented 5 months ago

@beqabeqa473

@csukuangfj thanks for your work.

To evaluate models it would be great to implement android tts engine api.

Im very new to working with android let alone with tts apis. Would it be possible to lead me in the right direction to trying to develope a android tts engine api? I have found some examples of offline-tts though im trying to figure how convert this into a engine api. I was thinking some of these would be useful ( TTS service wrapper , TTS-engine ), I think i just need some guidance as I am lost in a sea of knowledge. Let me know what you think.

csukuangfj commented 5 months ago

TTS-engine

Please wait for a moment. I have managed to create a tts engine service that you can use to replace the system tts engine.

I will create a PR soon.

LuekWasHere commented 5 months ago

I will create a PR soon. @csukuangfj Woah that is legendary. I'd love to explore what you did to do this, I'm interested in learning. I'll hold tight. Thanks Sorry for my naivety, What does PR stand for?

beqabeqa473 commented 5 months ago

I am also working on that.

I have already a working prototype of piper tts engine for android with downloading and installing voices.

I will make repo public soon

On 12/31/23, ekul_ @.***> wrote:

I will create a PR soon. @csukuangfj Woah that is legendary. I'd love to explore what you did to do this, I'm interested in learning. I'll hold tight. Thanks

-- Reply to this email directly or view it on GitHub: https://github.com/rhasspy/piper/issues/257#issuecomment-1872655597 You are receiving this because you were mentioned.

Message ID: @.***>

-- with best regards Beqa Gozalishvili Tell: +995593454005 Email: @.*** Web: https://gozaltech.org Skype: beqabeqa473 Telegram: https://t.me/gozaltech facebook: https://facebook.com/gozaltech twitter: https://twitter.com/beqabeqa473 Instagram: https://instagram.com/beqa.gozalishvili

csukuangfj commented 5 months ago

I will create a PR soon. @csukuangfj Woah that is legendary. I'd love to explore what you did to do this, I'm interested in learning. I'll hold tight. Thanks Sorry for my naivety, What does PR stand for?

@LuekWasHere

PR is short for pull request.

I just created one at https://github.com/k2-fsa/sherpa-onnx/pull/508

You can find a YouTube video at https://www.youtube.com/watch?v=33QYuVzDORA

csukuangfj commented 5 months ago

By the way, you can download pre-built text-to-speech engine APKs at https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html