lucidrains / naturalspeech2-pytorch

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
MIT License
1.26k stars 100 forks source link

loss.backward()?? #38

Open a897456 opened 6 months ago

a897456 commented 6 months ago

In Usage: loss = diffusion(raw_audio) loss.backward() Thank you for your work, very nice! And I'm sorry, as a newbie, I have to ask two stupid questions:

  1. Where does this backward() go? I didn't find a follow-up to it, so I had a second question
  2. naturalspeech2, I don't know if it's a model or a method, so I don't know how to train it, or if I need to train it
a897456 commented 6 months ago

image image

Hi,@lucidrains I trained it using Part 3 of Usage, which will take 100k steps, and 1k steps per epoch, so complete the training will use100 epochs. It should be able to generate 100 .flac files and 100 .pt files. At present, I have listened to the 51st generated .flac file and felt that it was white noise. What's going on, please?

a897456 commented 5 months ago

Hi @lucidrains Logically, when epoch=50, I should produce an audio file that doesn't sound like white noise, right? but, so far, the output of two files sound like white noise, do you know how to solve it? Please,THS

a897456 commented 5 months ago

https://github.com/lucidrains/naturalspeech2-pytorch/blob/659bec7f7543e7747e809e950cc2f84242fbeec7/naturalspeech2_pytorch/naturalspeech2_pytorch.py#L1700-L1701

https://github.com/lucidrains/naturalspeech2-pytorch/blob/659bec7f7543e7747e809e950cc2f84242fbeec7/naturalspeech2_pytorch/naturalspeech2_pytorch.py#L1874-L1879

HI @lucidrains Does this mean that only two sets of batch are involved in the loss calculation at each step?