Open jtfl28 opened 1 year ago
This is at least partially caused by the conditioning voice. For some reason some voices exhibit this more than others. I would try using different conditioning clips or fiddling with the one you have.
I have found that ensuring there is a period at the very end of the prompt can help with this. Interesting to know that it is voice-dependent. Good tip.
I have an open pull request to add more high quality voices to tortoise, with many audio clips in each voice. They can also be re-arranged to evoke specific emotions.
They seem good at not cutting off the end. I suspect the reason that some voices do that is because the clips do not have a soft transition at the end of the clip. What is not exactly audible to us, is a sudden and dramatic falloff of audio signal to the computer. Using samples from professional voice over clips seems to alleviate the issue.
So, when making new voices, it might be important to add a fade in and out to each and every audio clip — even if the fade lasts only a few ms.
I have an open pull request to add more high quality voices to tortoise, with many audio clips in each voice. They can also be re-arranged to evoke specific emotions.
They seem good at not cutting off the end. I suspect the reason that some voices do that is because the clips do not have a soft transition at the end of the clip. What is not exactly audible to us, is a sudden and dramatic falloff of audio signal to the computer. Using samples from professional voice over clips seems to alleviate the issue.
So, when making new voices, it might be important to add a fade in and out to each and every audio clip — even if the fade lasts only a few ms.
425
Tried that a while ago, does not work so I had to use below:
one_sec_segment = AudioSegment.silent(duration=500) #duration in milliseconds
sound = AudioSegment.from_wav(file)
final_sound = sound + one_sec_segment
final_sound.export(f'outputs/silenced_{fname}.wav', format="wav")
Which inserts the full audio, so seems to me the program itself is cutting the audio off
I have an open pull request to add more high quality voices to tortoise, with many audio clips in each voice. They can also be re-arranged to evoke specific emotions. They seem good at not cutting off the end. I suspect the reason that some voices do that is because the clips do not have a soft transition at the end of the clip. What is not exactly audible to us, is a sudden and dramatic falloff of audio signal to the computer. Using samples from professional voice over clips seems to alleviate the issue. So, when making new voices, it might be important to add a fade in and out to each and every audio clip — even if the fade lasts only a few ms.
425
Tried that a while ago, does not work so I had to use below:
one_sec_segment = AudioSegment.silent(duration=500) #duration in milliseconds sound = AudioSegment.from_wav(file) final_sound = sound + one_sec_segment final_sound.export(f'outputs/silenced_{fname}.wav', format="wav")
Which inserts the full audio, so seems to me the program itself is cutting the audio off
I'm considering fixing this bug. This may be a bug with saving the sound to a wave file. Where did you put your code in order to get it to work and not cut off the end? How do I replicate your results?
Just write something like "End" after the last word of each sentence in the prompt. This prevents your intended last word from being cut short, using your placeholder word (like "End") instead. This make it easy to edit and cut out the end word later. This is my workaround, anyway. Works for me.
Almost every clip I produce abruptly ends the sentence just a second early. Most of the time it doesn't complete the last word so just adding blank space in-between the sentences won't work.
Is there any way to avoid this? Thanks in advance for the help!