modelscope / FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
https://funcodec.github.io/
MIT License
371 stars 30 forks source link

LauraTTS模型的训练花了多长时间? #36

Open Dinxin opened 8 months ago

ZhihaoDU commented 8 months ago

The model is trained about 1.5 day on LibriTTS clean subset with an A800 GPU, and the batch size is 10240 tokens.

Dinxin commented 8 months ago

on 8 A100 GPUs? The total duration is 6000 hours ?

ZhihaoDU commented 8 months ago

Only one A800 GPU. I think the duration of LibriTTS clean subset is about 244 hours

a897456 commented 7 months ago

The model is trained about 1.5 day on LibriTTS clean subset with an A800 GPU, and the batch size is 10240 tokens.

Hi @ZhihaoDU 10240 tokens?How do you calculate it? speech_max_length / encoder_hop_length batch_size = 40960 / 320 16 = 2048?

a897456 commented 7 months ago

@ZhihaoDU please 数据集132000个文件,如果设置batch_size=8, 两者相除等于16250,这是不是意味着如果还是保持num_iters_per_epoch=10000,这是不合适的。你可以分享一下,batch_size、num_iters_per_epoch、num_works怎么计算才比较合理? 另外你设置的input_size=1,也参与计算吗?