neonbjb / tortoise-tts

A multi-voice TTS system trained with an emphasis on quality
Apache License 2.0
13.15k stars 1.82k forks source link

Stop Audio cut off at the end/is there a way to add time buffer? #416

Open jtfl28 opened 1 year ago

jtfl28 commented 1 year ago

Almost every clip I produce abruptly ends the sentence just a second early. Most of the time it doesn't complete the last word so just adding blank space in-between the sentences won't work.

Is there any way to avoid this? Thanks in advance for the help!

neonbjb commented 1 year ago

This is at least partially caused by the conditioning voice. For some reason some voices exhibit this more than others. I would try using different conditioning clips or fiddling with the one you have.

n8bot commented 1 year ago

I have found that ensuring there is a period at the very end of the prompt can help with this. Interesting to know that it is voice-dependent. Good tip.

n8bot commented 1 year ago

I have an open pull request to add more high quality voices to tortoise, with many audio clips in each voice. They can also be re-arranged to evoke specific emotions.

They seem good at not cutting off the end. I suspect the reason that some voices do that is because the clips do not have a soft transition at the end of the clip. What is not exactly audible to us, is a sudden and dramatic falloff of audio signal to the computer. Using samples from professional voice over clips seems to alleviate the issue.

So, when making new voices, it might be important to add a fade in and out to each and every audio clip — even if the fade lasts only a few ms.

https://github.com/neonbjb/tortoise-tts/pull/425

ziyaad30 commented 1 year ago

I have an open pull request to add more high quality voices to tortoise, with many audio clips in each voice. They can also be re-arranged to evoke specific emotions.

They seem good at not cutting off the end. I suspect the reason that some voices do that is because the clips do not have a soft transition at the end of the clip. What is not exactly audible to us, is a sudden and dramatic falloff of audio signal to the computer. Using samples from professional voice over clips seems to alleviate the issue.

So, when making new voices, it might be important to add a fade in and out to each and every audio clip — even if the fade lasts only a few ms.

425

Tried that a while ago, does not work so I had to use below:

one_sec_segment = AudioSegment.silent(duration=500)  #duration in milliseconds
sound = AudioSegment.from_wav(file)
final_sound = sound + one_sec_segment
final_sound.export(f'outputs/silenced_{fname}.wav', format="wav")

Which inserts the full audio, so seems to me the program itself is cutting the audio off

spottenn commented 1 year ago

I have an open pull request to add more high quality voices to tortoise, with many audio clips in each voice. They can also be re-arranged to evoke specific emotions. They seem good at not cutting off the end. I suspect the reason that some voices do that is because the clips do not have a soft transition at the end of the clip. What is not exactly audible to us, is a sudden and dramatic falloff of audio signal to the computer. Using samples from professional voice over clips seems to alleviate the issue. So, when making new voices, it might be important to add a fade in and out to each and every audio clip — even if the fade lasts only a few ms.

425

Tried that a while ago, does not work so I had to use below:

one_sec_segment = AudioSegment.silent(duration=500)  #duration in milliseconds
sound = AudioSegment.from_wav(file)
final_sound = sound + one_sec_segment
final_sound.export(f'outputs/silenced_{fname}.wav', format="wav")

Which inserts the full audio, so seems to me the program itself is cutting the audio off

I'm considering fixing this bug. This may be a bug with saving the sound to a wave file. Where did you put your code in order to get it to work and not cut off the end? How do I replicate your results?

worldwidewebcap commented 6 months ago

Just write something like "End" after the last word of each sentence in the prompt. This prevents your intended last word from being cut short, using your placeholder word (like "End") instead. This make it easy to edit and cut out the end word later. This is my workaround, anyway. Works for me.