kaiidams / NeMoOnnxSharp

Text-to-speech and speech recognition, VAD with NVIDIA NeMo and ONNX Runtime for .NET Core.
Apache License 2.0
18 stars 2 forks source link

Possible to support German language? #20

Closed GeorgeS2019 closed 11 months ago

GeorgeS2019 commented 11 months ago

Possible to support German language

kaiidams commented 11 months ago

NeMo provides German models. Writing phonemizers/tokenizers for German should not be difficult.

GeorgeS2019 commented 11 months ago

@kaiidams I went through NeMo, I could not find how German is supported. Any link would really appreciate.

kaiidams commented 11 months ago

This page lists available models https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/tts/checkpoints.html

If you have NeMo installed. You can run

from nemo.collections.tts.models.base import SpectrogramGenerator, Vocoder
from nemo.collections.asr.models import EncDecCTCModel
SpectrogramGenerator.list_available_models()
Vocoder.list_available_models()
EncDecCTCModel.list_available_models()

to get lists.

[PretrainedModelInfo(
    pretrained_model_name=QuartzNet15x5Base-En,
    description=QuartzNet15x5 model trained on six datasets: LibriSpeech, Mozilla Common Voice (validated clips from en_1488h_2019-12-10), WSJ, Fisher, Switchboard, and NSC Singapore English. It was trained with Apex/Amp optimization level O1 for 600 epochs. The model achieves a WER of 3.79% on LibriSpeech dev-clean, and a WER of 10.05% on dev-other. Please visit https://ngc.nvidia.com/catalog/models/nvidia:nemospeechmodels for further details.,
    location=https://api.ngc.nvidia.com/v2/models/nvidia/nemospeechmodels/versions/1.0.0a5/files/QuartzNet15x5Base-En.nemo
 ),
 PretrainedModelInfo(
    pretrained_model_name=stt_en_quartznet15x5,
    description=For details about this model, please visit https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_quartznet15x5,
    location=https://api.ngc.nvidia.com/v2/models/nvidia/nemo/stt_en_quartznet15x5/versions/1.0.0rc1/files/stt_en_quartznet15x5.nemo
 ),
 PretrainedModelInfo(
...
GeorgeS2019 commented 11 months ago

Any suggestion which pair to use? If I understand correctly?

I will read more and come back. Thank you

German

For German STT https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_de_quartznet15x5 ```cs string modelPath = await DownloadModelAsync("stt_de_quartznet15x5"); ```

Todo

  • [ ] phonemizers for German
  • [ ] tokenizers for German

    Writing phonemizers/tokenizers for German should not be difficult.

    For German TTS

Mel-Spectrogram Generators

de-DE | tts_de_fastpitch_multispeaker_5 -- | -- de-DE | tts_de_fastpitch_singleSpeaker_thorstenNeutral_2102: 21.02 -- | -- de-DE | tts_de_fastpitch_singleSpeaker_thorstenNeutral_2102: 22.10 -- | --

Vocoders

de-DE | tts_de_hui_hifigan_ft_fastpitch_multispeaker_5 -- | -- de-DE | tts_de_hifigan_singleSpeaker_thorstenNeutral_2102:21.02 -- | -- de-DE | tts_de_hifigan_singleSpeaker_thorstenNeutral_2210: 22.10 -- | --

English

For English STT https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_en_quartznet15x5 ```cs string modelPath = await DownloadModelAsync("stt_en_quartznet15x5"); ```
For English TTS ```cs string phonemeDict = await DownloadModelAsync("cmudict-0.7b_nv22.10"); string heteronyms = await DownloadModelAsync("heteronyms-052722"); string specGenModelPath = await DownloadModelAsync("tts_en_fastpitch"); string vocoderModelPath = await DownloadModelAsync("tts_en_hifigan"); ```