GeorgeS2019 commented 1 year ago

NVIDIA NeMo (ByT5 G2P and G2P-Conformer):

NVIDIA NeMo provides grapheme-to-phoneme models for various languages, including German.

The ByT5 G2P model is based on a neural network and can handle out-of-vocabulary words (OOV) and heteronyms (words with the same spelling but different pronunciations).

The G2P-Conformer model is a non-autoregressive CTC model that is faster during inference.

These models allow you to enforce desired pronunciations by providing a phonetic transcript of the input. You can train and evaluate these models using manifest files containing grapheme and phoneme pairs

GeorgeS2019 commented 8 months ago

Is it possible to do this using NeMoOnnxSharp for German?

kaiidams commented 8 months ago

It supports both German TTS/ASR. See this https://github.com/kaiidams/NeMoOnnxSharp/blob/ad2ffe375e525bb63c59c9b1cd5154afe70351a0/NeMoOnnxSharp.Example/Program.cs#L39

GeorgeS2019 commented 8 months ago

I have use the code for German

Here is the feedback

The volume of TTS for German is softer than when using Microsoft Speech.

GeorgeS2019 commented 8 months ago

Second,

I have seen Mel and MFCC code. I wonder if these codes can be repurposed for German audio and eventually to extract German phonemes from German Audio

In the entire internet, hardly anything like this. Even Wav2ToVec2 is not often shown how to work with the German langauge.

Can you do something about this?

GeorgeS2019 commented 8 months ago

It supports both German TTS/ASR. See this

I have tried TTS/ASR for German: My interest is extraction of German Phonemes from German Audio

kaiidams commented 8 months ago

In case of German, their pronunciation is not ambiguous. Why do you need a phonemizer? In case of English, NeMo FastPitch was trained with a phonemizer which translates all but ambiguous words, and FastPitch can handle ambiguous words in many cases.

GeorgeS2019 commented 8 months ago

https://github.com/kaiidams/NeMoOnnxSharp/blob/main/NeMoOnnxSharp/TTSTokenizers/EnglishG2p.cs

Is there GermanG2P.cs in NeMoOnnxSharp?

their pronunciation is not ambiguous.

explain please. Not sure I understand how this impacts how to proceed.

GeorgeS2019 commented 8 months ago

FastPitch is a text-to-speech (TTS) model developed by NVIDIA. It's a fully-parallel transformer architecture with prosody control over pitch and individual phoneme duration¹. Here are some key features:

Fully-Parallel Architecture: Unlike traditional TTS models that generate speech sequentially, FastPitch generates speech in parallel, which makes it much faster¹.
Prosody Control: FastPitch allows for control over the pitch and duration of individual phonemes, which can make the generated speech more expressive and engaging¹.
Transformer-Based: FastPitch is based on the Transformer architecture, which is known for its efficiency and scalability¹.
Integration with NeMo: FastPitch can be trained or fine-tuned using NVIDIA's NeMo framework, a generative AI framework built for working on large language models (LLMs), multimodal models (MM), automatic speech recognition (ASR), and text-to-speech synthesis (TTS)².

FastPitch is used for generating mel spectrograms from text, which can then be converted to audio using a vocoder¹. It's trained on the LJSpeech dataset sampled at 22050Hz and has been tested on generating female English voices with an American accent¹. Please note that this model works well with vocoders that were trained on 22050Hz data¹.

Source: Conversation with Bing, 3/30/2024 (1) TTS En FastPitch | NVIDIA NGC. https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/tts_en_fastpitch. (2) GitHub - NVIDIA/NeMo: NeMo: a framework for generative AI. https://github.com/NVIDIA/NeMo. (3) Google Colab. https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/tts/FastPitch_MixerTTS_Training.ipynb. (4) undefined. https://arxiv.org/abs/2006.06873.

kaiidams commented 8 months ago

Is there GermanG2P.cs in NeMoOnnxSharp?

FastPitch of NeMo uses a phonemizer for English but doesn't use for German. NeMoOnnxSharp doesn't contain German phonemizer.

kaiidams / NeMoOnnxSharp

Possible to improve English and German pronunciation? #26

NVIDIA NeMo (ByT5 G2P and G2P-Conformer):