kaiidams / voice100

Voice100 includes neural TTS/ASR models. Inference of Voice100 is low cost as its models are tiny and only depend on CNN without autoregression.
MIT License
26 stars 3 forks source link

Speech to Speech #71

Open GeorgeS2019 opened 2 years ago

GeorgeS2019 commented 2 years ago

For your looking-ahead inspiration: speech_to_speech

kaiidams commented 2 years ago

Thanks. They use k-mean clustered audio and seq2seq to translate them to translate Spanish-English. k-mean clustered audio can be used to replace CMU phonemes in Voice100. For Speech-to-Speech translation, I'm not sure it is good enough with a small model for mobiles.

GeorgeS2019 commented 3 months ago

image

Is it challenging to do this for German using NeMoOnnxSharp?