Sometimes redundant duplicated text is generated. I use default model and config (no fine-tuning). Occurrence rate is not 100%, it happens sometimes (that is why I use a loop in my code example below). In my example words "is inspired by the dishes" are generated several times, check the audio: https://drive.google.com/file/d/1geLlH2im1bCLMpQcQV7QgRWU0c57eG4y/view
text = "on the menu that Sam our chef here has put together, Okay this is one of our best sellers isn't it Sam, Yes it is, So this is our scampi, So I grew up in a pub and a lot of the things on the menu is inspired by the dishes from"
print(len(text))
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2")
for i in range(10):
tts.tts_to_file(text=text,
file_path=f"test_{i}.wav",
speaker_wav="./tests/data/ljspeech/wavs/LJ001-0001.wav",
language='en',
split_sentences=False)
To Reproduce
Run the code from description. Some of generated files may contain text duplication.
Expected behavior
Redundant text is not generated.
Logs
226
/Users/olehsamoilenko/coqui-ai-TTS/TTS/tts/layers/xtts/xtts_manager.py:6: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
self.speakers = torch.load(speaker_file_path)
/opt/anaconda3/envs/coqui/lib/python3.9/site-packages/trainer/io.py:83: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
return torch.load(f, map_location=map_location, **kwargs)
This is not a bug, it's just due to how the XTTS model works and not possible to avoid completely. You could try to shorten your input by splitting the sentences.
Describe the bug
Sometimes redundant duplicated text is generated. I use default model and config (no fine-tuning). Occurrence rate is not 100%, it happens sometimes (that is why I use a loop in my code example below). In my example words "is inspired by the dishes" are generated several times, check the audio: https://drive.google.com/file/d/1geLlH2im1bCLMpQcQV7QgRWU0c57eG4y/view
May it relate to the fact that word "menu" occurs 2 times in my text? Text is pretty long, but < 250 characters so should be acceptable. Also may be related to the issue discussed here: https://github.com/coqui-ai/TTS/issues/3516 and potential fix here: https://github.com/coqui-ai/TTS/issues/3516#issuecomment-2050867261. Is it a bug or I use the library wrong?
CC: @eginhard @bensonbs
To Reproduce
Run the code from description. Some of generated files may contain text duplication.
Expected behavior
Redundant text is not generated.
Logs
Environment
Additional context
No response