r9y9 / wavenet_vocoder

WaveNet vocoder
https://r9y9.github.io/wavenet_vocoder/
Other
2.3k stars 500 forks source link

Speech Marks #210

Closed genabotpub closed 3 years ago

genabotpub commented 3 years ago

Hello,

Thank you very much for this project.

I am looking for a way to generate speech marks identifying the boundary and playback duration of each rendered word in milliseconds.

At the moment we are using a speech segmentation model to achieve that, but are looking for a method to achieve this without the need for the postprocessing.

Would appreciate any advice on the best way to achieve this.

Thank you, gen

r9y9 commented 3 years ago

The question is unrelated to this repository. I cannot help on your question.

https://github.com/kan-bayashi/ParallelWaveGAN/issues/251 Also I don't think it's a good thing to ask the same questions to multiple places.

genabotpub commented 3 years ago

Thank you for your response. The reason behind two questions is that each of the repositories provide two different implementations and maybe at least one could offer some relevant capabilities. You are saying that this question is unrelated to this repository, than can you please indicate which part of the TTS end-to-end chain should be responsible for detecting and generating speech-marks.

Best-regards, gen