So I want to respeak my live recorded speech.
That means: mic -> text -> sound. Or in another words: Speech to Text and then Text to Speech.
I stumbled onto your project which can be used as voice (in a transparent manner) from espeak in order to convert from text to sound.
That's nice but it's a bit limited. I have to choose a voice according the language and I'm losing quite of the speech information while it's transformed into text.
So I would like to record sound via a mic, convert it into diphones and then just convert the diphones back to Speech.
My question: Is there any tool that allows to convert a wave file to a series of diphones (without converting the speech to text under the hood, that is)? Or in another words... instead of an speech recognizer is there a diphone recognizer? Maybe this tool has another name which I'm aware of?
So I want to respeak my live recorded speech. That means: mic -> text -> sound. Or in another words: Speech to Text and then Text to Speech.
I stumbled onto your project which can be used as voice (in a transparent manner) from espeak in order to convert from text to sound.
That's nice but it's a bit limited. I have to choose a voice according the language and I'm losing quite of the speech information while it's transformed into text.
So I would like to record sound via a mic, convert it into diphones and then just convert the diphones back to Speech.
My question: Is there any tool that allows to convert a wave file to a series of diphones (without converting the speech to text under the hood, that is)? Or in another words... instead of an speech recognizer is there a diphone recognizer? Maybe this tool has another name which I'm aware of?
Thank you for your feedback!