Open ethanyhzhang opened 11 months ago
FYI the training loss drop from 20+ at the beginning to about 0.3~ at the end.
您好,能请教您一些问题吗
안녕하세요, 훌륭한 일을 해주셔서 감사합니다!
아래 교육 과정을 따르고 조건부 입력 없이 순수한 오디오를 사용했습니다.
110,000개 이상의 오디오가 포함된 데이터세트를 시도했고 오디오가 500개만 포함된 데이터세트도 시도했지만 100,000회 반복 학습한 후에는 학습된 네트워크가 포함된 샘플링 파이프라인을 사용하여 출력이 의미가 없었습니다.
혹시 문제가 있으면 알려주실 수 있나요? 아니면 의미 있는 훈련을 받았나요?
감사해요.
from naturalspeech2_pytorch import Trainer trainer = Trainer( diffusion_model = diffusion, # diffusion model + codec from above folder = '/path/to/speech', train_batch_size = 16, gradient_accumulate_every = 2, ) trainer.train()
Will the code above train? When I put an audio file in the folder and run the code, I get an error.
import torch
from naturalspeech2_pytorch import Trainer, EncodecWrapper, Model, NaturalSpeech2, SpeechPromptEncoder from multiprocessing import freeze_support
codec = EncodecWrapper()
def main(): model = Model( dim = 128, depth = 6, dim_prompt = 512, cond_drop_prob = 0.25, condition_on_prompt = True )
diffusion = NaturalSpeech2(
model = model,
codec = codec,
timesteps = 50
)
raw_audio = torch.randn(4, 327680)
prompt = torch.randn(4, 32768)
text = torch.randint(0, 100, (4, 100))
text_lens = torch.tensor([100, 50 , 80, 100])
# forwards and backwards
loss = diffusion(
audio = raw_audio,
text = text,
text_lens = text_lens,
prompt = prompt,
)
loss.backward()
# after much training
generated_audio = diffusion.sample(
length = 1024,
text = text,
prompt = prompt,
)
trainer = Trainer(
diffusion_model = diffusion,
folder = 'C:\\naturalspeech2-pytorch\\0049_G1A2E7_JHJ',
train_batch_size = 16,
gradient_accumulate_every = 2,
train_num_steps = 5,
save_and_sample_every = 100,
)
trainer.train()
trainer.save_checkpoint('C:\\naturalspeech2-pytorch\\ansunghun\\checkpoint.pt')
if name == 'main': freeze_support() main()
Traceback (most recent call last):
File "c:\naturalspeech2-pytorch\test.py", line 62, in
Hi, thanks for the great job!
I've follow the training process below and use pure audio without conditional input.
I've tried dataset with 110000+ audios and also tried dataset with only 500 audios, but after training for 100000 iterations, output had no meaning using the sampling pipeline with the trained network.
could you pls tell me if there is anything wrong. Or have you trained something meaningful?
Thanks.
Hi @ethanyhzhang I have also completed half of the training unconditionally, each epoch generated audio files which sounded like the white noise. Did you have the same problem? How did you fix it? Please share it. THS
FYI the training loss drop from 20+ at the beginning to about 0.3~ at the end.
Hi @ethanyhzhang My initial loss was 0.2 and after 10k step it was 0.3. And I don't know how to use the .pt file which have be generated in every epoch and has 6 items like that:
Hi, thanks for the great job!
I've follow the training process below and use pure audio without conditional input.
I've tried dataset with 110000+ audios and also tried dataset with only 500 audios, but after training for 100000 iterations, output had no meaning using the sampling pipeline with the trained network.
could you pls tell me if there is anything wrong. Or have you trained something meaningful?
Thanks.