sh-lee-prml / HierSpeechpp

The official implementation of HierSpeech++
MIT License
1.13k stars 134 forks source link

fix: detach the tensors in PitchPredictor #26

Closed Gabibing closed 5 months ago

Gabibing commented 5 months ago

The tensors in PitchPredictor are detached to ensure they do not influence the TTV result.

sh-lee-prml commented 5 months ago

The W2V representation contains a little speaker information so I intend to learn an additional prosody(pitch information) in the prosody embedding for a pitch predictor.

If the gradient of prosody embedding is detached in pitch predictor, the prosody encoder could not learn the speaker-specific pitch information well.

During training, we already used a gt w2v represention so w2v.detach() will not be affected now.

However, for the future model, I have a plan to train the pitch predictor explicitly as you suggested.

Thanks!