yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
MIT License
4.97k stars 419 forks source link

May be a bug? input parameters for model.predictor_encoder and model.style_encoder in train_finetune.py #243

Open starmoon-1134 opened 5 months ago

starmoon-1134 commented 5 months ago

In the train_finetune.py file, we have noticed a potential issue with the input parameters for model.predictor_encoder and model.style_encoder. The current code is as follows:

s = model.style_encoder(gt.unsqueeze(1))           
s_dur = model.predictor_encoder(gt.unsqueeze(1))

However, in the train_second.py file, we have found a different implementation that takes into account the multispeaker scenario:

s_dur = model.predictor_encoder(st.unsqueeze(1) if multispeaker else gt.unsqueeze(1))
s = model.style_encoder(st.unsqueeze(1) if multispeaker else gt.unsqueeze(1))