Open deliciouslytyped opened 2 years ago
I'm confused because sometime this works, other times it doesn't. Using the "test" test string, at first I got a synthesis with an extended end and malformed audio, then it worked and I couldn't reproduce it anymore. I don't think I changed anything, but I'm not sure.
Now, I accidentally reproduced the bad sample: badoutput.zip (this is a zipped wav file, due to GitHub's restrictions)
Instead of just "test", you can hear something like "test-t-t-t-t-t-t-t....".
All I changed is max_decoder_steps
to 1000.
I get the same thing. If sentences are past a certain length, they are cut off in the produced wav. Here's a simple example:
❯ tts --text "This sentence, being as long as it is, most unfortunately, will not be fully stated." --out_path test.wav
> tts_models/en/ljspeech/tacotron2-DDC is already downloaded.
> vocoder_models/en/ljspeech/hifigan_v2 is already downloaded.
> Using model: Tacotron2
> Model's reduction rate `r` is set to: 1
> Vocoder Model: hifigan
> Generator Model: hifigan_generator
> Discriminator Model: hifigan_discriminator
Removing weight norm...
> Text: This sentence, being as long as it is, most unfortunately, will not be fully stated.
> Text splitted to sentences.
['This sentence, being as long as it is, most unfortunately, will not be fully stated.']
> Decoder stopped with `max_decoder_steps` 500
> Processing time: 3.1818737983703613
> Real-time factor: 0.49914852912682467
> Saving output to test.wav
In this example, the speaker is cut off before saying "stated".
How can we synthesize arbitrarily long sentences?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discourse page for further help. https://discourse.mozilla.org/c/tts
It may be stale, but this issue is not fixed. It's easy to reproduce and a blocker for any serious work with TTS.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discourse page for further help. https://discourse.mozilla.org/c/tts
It may be stale, but this issue is not fixed. It's easy to reproduce and a blocker for any serious work with TTS.
Suffering this issue too. Unsure what to do to resolve it. Will try other models to see what happens, I suppose.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discourse page for further help. https://discourse.mozilla.org/c/tts
It may be stale, but this issue is not fixed. It's easy to reproduce and a blocker for any serious work with TTS.
I have the same problem here, long sentences get truncated.
It seems to be just a configuration as they say here https://github.com/thorstenMueller/Thorsten-Voice/issues/22
setting "max_decoder_steps": 10000 in the model config.json solved the problem
Running
tts --text
on some meaningful sentences results in the following output:The audio file is truncated with respect to the text. If I hack the config file at
TTS/tts/configs/tacotron_config.py
to have a largermax_decoder_steps
value, the output does seem to successfully get longer, but I'm not sure how safe this is.Are there any better solutions? Should I use a different model?