Open OlegRuban-ai opened 16 hours ago
Can you fix the seed as below and compare both? For me both methods of loading result in the exact same output:
import os
import torch
from trainer.io import get_user_data_dir
from TTS.api import TTS
model_name = "tts_models/multilingual/multi-dataset/xtts_v2"
xtts1 = TTS(model_name).to("cuda")
xtts_dir = os.path.join(get_user_data_dir("tts"), "--".join(model_name.split("/")))
xtts2 = TTS(model_path=xtts_dir, config_path=os.path.join(xtts_dir, "config.json")).to("cuda")
torch.manual_seed(123)
out1 = xtts1.tts("This is a test", speaker="Ana Florence", language="en")
torch.manual_seed(123)
out2 = xtts2.tts("This is a test", speaker="Ana Florence", language="en")
assert out1 == out2
@eginhard
Thank you, but probplem not fix.
When we use text2speech, the results are identical if there is one speaker. But when using speaker_wav we do this:
model_tts.tts_to_file( text=prompt, file_path=audio_path_result, speaker_wav=processed_file,
language=language,
split_sentences=split_sentences,
# speaker="Ana Florence",
# preset='high_quality",'
)
then the results are different.
from trainer.io import get_user_data_dir model_name = "tts_models/multilingual/multi-dataset/xtts_v2" tts_1 = TTS(model_name, gpu=True) xtts_dir = os.path.join(get_user_data_dir("tts"), "--".join(model_name.split("/"))) tts_2 = TTS(
model_path=xtts_dir,
config_path="/models_and_tokenizers/text2audio/config.json",
progress_bar=False,
gpu=True,
)
When using tts_1, the voice is similar to the original, but when using tts_2, it is not at all similar.
But there is still a problem with the limit of 182 tokens for the Russian language. To fix this when loading from HF, I used a replacement in tokenizer.py, but when loading from a local file, that path is no longer used for some reason (tts/layers/xtts/tokenizer.py). How can I bypass the restriction?
And is it possible to put emphasis, control the speaker’s speaking speed and add emotions? I couldn't find anything like that in the code.
And can you help me? How to use split_sentences with this code?
from TTS.tts.configs.xtts_config import XttsConfig from TTS.tts.models.xtts import Xtts
config = XttsConfig() config.load_json("/path/to/xtts/config.json") model = Xtts.init_from_config(config) model.load_checkpoint(config, checkpoint_dir="/path/to/xtts/", eval=True) model.cuda()
outputs = model.synthesize( "It took me quite a long time to develop a voice and now that I have it I am not going to be silent.", config, speaker_wav="/data/TTS-public/_refclips/3.wav", gpt_cond_len=3, language="en", )
Describe the bug
I used 2 options to download model:
tts = TTS( model_path="/XTTS", config_path="/XTTS/config.json", progress_bar=True, ).to('cuda')
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2", progress_bar=True, gpu=True
In the first option, the result after generation is much worse than in the second with standard loading. Why? How to load all configuration files correctly and from where?
I took the model for model_path and the config from here: https://huggingface.co/coqui/XTTS-v2/tree/main
To Reproduce
tts = TTS( model_path="/XTTS", config_path="/XTTS/config.json", progress_bar=True, ).to('cuda')
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2", progress_bar=True, gpu=True
Expected behavior
No response
Logs
No response
Environment
Additional context
No response