two pitch? - Githubissues

a897456 commented 5 months ago

the first pitch in the sample() as follow: https://github.com/lucidrains/naturalspeech2-pytorch/blob/659bec7f7543e7747e809e950cc2f84242fbeec7/naturalspeech2_pytorch/naturalspeech2_pytorch.py#L1478-L1479

the second pitch in the forward() of Naturalspeech2 as follow: https://github.com/lucidrains/naturalspeech2-pytorch/blob/659bec7f7543e7747e809e950cc2f84242fbeec7/naturalspeech2_pytorch/naturalspeech2_pytorch.py#L1543-L1556

Personally, I think the first pitch is from the prompt, and the second pitch is from the training data, right?
Personally, I think the prompt is a small part of the training data, such as the training data is10s, from which prompt takes 2s, right?
Because the input format of the prompt and the training data is the same, why are the calculation methods of pitch different?

lexkoro commented 5 months ago

One is the ground truth pitch and the other one is the predicted

a897456 commented 5 months ago

One is the ground truth pitch and the other one is the predicted

Thank you for your reply. @lexkoro By the way, have you completed the conditional training? And can you share how to generate the prompt and the text just like LJSpeech dataset.

lexkoro commented 5 months ago

I don't think the repository is usable yet.

lucidrains / naturalspeech2-pytorch

two pitch? #40