Open GeorgeS2019 opened 1 year ago
Is it possible to do this using NeMoOnnxSharp for German?
It supports both German TTS/ASR. See this https://github.com/kaiidams/NeMoOnnxSharp/blob/ad2ffe375e525bb63c59c9b1cd5154afe70351a0/NeMoOnnxSharp.Example/Program.cs#L39
I have use the code for German
Here is the feedback
Second,
I have seen Mel and MFCC code. I wonder if these codes can be repurposed for German audio and eventually to extract German phonemes from German Audio
In the entire internet, hardly anything like this. Even Wav2ToVec2 is not often shown how to work with the German langauge.
Can you do something about this?
It supports both German TTS/ASR. See this
I have tried TTS/ASR for German: My interest is extraction of German Phonemes from German Audio
In case of German, their pronunciation is not ambiguous. Why do you need a phonemizer? In case of English, NeMo FastPitch was trained with a phonemizer which translates all but ambiguous words, and FastPitch can handle ambiguous words in many cases.
https://github.com/kaiidams/NeMoOnnxSharp/blob/main/NeMoOnnxSharp/TTSTokenizers/EnglishG2p.cs
Is there GermanG2P.cs in NeMoOnnxSharp?
their pronunciation is not ambiguous.
explain please. Not sure I understand how this impacts how to proceed.
FastPitch is a text-to-speech (TTS) model developed by NVIDIA. It's a fully-parallel transformer architecture with prosody control over pitch and individual phoneme duration¹. Here are some key features:
FastPitch is used for generating mel spectrograms from text, which can then be converted to audio using a vocoder¹. It's trained on the LJSpeech dataset sampled at 22050Hz and has been tested on generating female English voices with an American accent¹. Please note that this model works well with vocoders that were trained on 22050Hz data¹.
Source: Conversation with Bing, 3/30/2024 (1) TTS En FastPitch | NVIDIA NGC. https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/tts_en_fastpitch. (2) GitHub - NVIDIA/NeMo: NeMo: a framework for generative AI. https://github.com/NVIDIA/NeMo. (3) Google Colab. https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/FastPitch_MixerTTS_Training.ipynb. (4) undefined. https://arxiv.org/abs/2006.06873.
Is there GermanG2P.cs in NeMoOnnxSharp?
FastPitch of NeMo uses a phonemizer for English but doesn't use for German. NeMoOnnxSharp doesn't contain German phonemizer.
NVIDIA NeMo (ByT5 G2P and G2P-Conformer):
These models allow you to enforce desired pronunciations by providing a phonetic transcript of the input. You can train and evaluate these models using manifest files containing grapheme and phoneme pairs