lucidrains / naturalspeech2-pytorch

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
MIT License
1.25k stars 100 forks source link

two pitch? #40

Open a897456 opened 5 months ago

a897456 commented 5 months ago

the first pitch in the sample() as follow: https://github.com/lucidrains/naturalspeech2-pytorch/blob/659bec7f7543e7747e809e950cc2f84242fbeec7/naturalspeech2_pytorch/naturalspeech2_pytorch.py#L1478-L1479

the second pitch in the forward() of Naturalspeech2 as follow: https://github.com/lucidrains/naturalspeech2-pytorch/blob/659bec7f7543e7747e809e950cc2f84242fbeec7/naturalspeech2_pytorch/naturalspeech2_pytorch.py#L1543-L1556

  1. Personally, I think the first pitch is from the prompt, and the second pitch is from the training data, right?
  2. Personally, I think the prompt is a small part of the training data, such as the training data is10s, from which prompt takes 2s, right?
  3. Because the input format of the prompt and the training data is the same, why are the calculation methods of pitch different?
lexkoro commented 5 months ago

One is the ground truth pitch and the other one is the predicted

a897456 commented 5 months ago

One is the ground truth pitch and the other one is the predicted

Thank you for your reply. @lexkoro By the way, have you completed the conditional training? And can you share how to generate the prompt and the text just like LJSpeech dataset.

lexkoro commented 5 months ago

I don't think the repository is usable yet.