In the train_finetune.py file, we have noticed a potential issue with the input parameters for model.predictor_encoder and model.style_encoder. The current code is as follows:
s = model.style_encoder(gt.unsqueeze(1))
s_dur = model.predictor_encoder(gt.unsqueeze(1))
However, in the train_second.py file, we have found a different implementation that takes into account the multispeaker scenario:
s_dur = model.predictor_encoder(st.unsqueeze(1) if multispeaker else gt.unsqueeze(1))
s = model.style_encoder(st.unsqueeze(1) if multispeaker else gt.unsqueeze(1))
In the train_finetune.py file, we have noticed a potential issue with the input parameters for model.predictor_encoder and model.style_encoder. The current code is as follows:
However, in the train_second.py file, we have found a different implementation that takes into account the multispeaker scenario: