Open GeorgeS2019 opened 2 years ago
Thanks. They use k-mean clustered audio and seq2seq to translate them to translate Spanish-English. k-mean clustered audio can be used to replace CMU phonemes in Voice100. For Speech-to-Speech translation, I'm not sure it is good enough with a small model for mobiles.
Is it challenging to do this for German using NeMoOnnxSharp?
For your looking-ahead inspiration: speech_to_speech