Speech to diphone? - Githubissues

So I want to respeak my live recorded speech. That means: mic -> text -> sound. Or in another words: Speech to Text and then Text to Speech.

I stumbled onto your project which can be used as voice (in a transparent manner) from espeak in order to convert from text to sound.

That's nice but it's a bit limited. I have to choose a voice according the language and I'm losing quite of the speech information while it's transformed into text.

So I would like to record sound via a mic, convert it into diphones and then just convert the diphones back to Speech.

My question: Is there any tool that allows to convert a wave file to a series of diphones (without converting the speech to text under the hood, that is)? Or in another words... instead of an speech recognizer is there a diphone recognizer? Maybe this tool has another name which I'm aware of?

Thank you for your feedback!

numediart / MBROLA

Speech to diphone? #38