open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
4.45k stars 379 forks source link

Natural Speech2 Training Speed #120

Closed hexastrayer closed 7 months ago

hexastrayer commented 7 months ago

I'm interested in the training time for NS2. I'm currently utilizing the accelerate launch with a batch size of 16 across 8 Tesla V100 GPUs. However, each step takes approximately 5 seconds. I noticed that the checkpoint you supplied corresponds to 500k steps, potentially extending the training time to over 20 days. Is this training time normal, or is there something wrong?

hexastrayer commented 7 months ago

After running for a period of time (200 step), the speed speeds up to about 3 seconds per step, but I think it is still relatively slow. Specifically, the model's forward process and the "self.accelerator.backward(total_loss)" step within the "_train_step" function each consumes approximately 1.5 seconds.

HeCheng0625 commented 7 months ago

Hi, I think 1.5s per step is a normal speed for V100. The main reason affecting training speed may stem from IO reading, especially if your data is stored in the cloud rather than on a fast disk with read/write capabilities. One feasible solution is to preload all data into memory beforehand.

hexastrayer commented 7 months ago

Thank you for your careful answer. I notice that the weight of diff_ce loss is set to 0.5 (0.1 in origin paper) and the diff_loss is set to L1 loss (L2 in origin paper). Are these the optimal hyperparameters after your experiments, or would the ones in the original article be better?

dongngm commented 7 months ago

@hexastrayer How do you manage to train NS2? I've seen that there still be mismatch in data preprocessing part mentioned in https://github.com/open-mmlab/Amphion/issues/43. Could you please create PR for this? Thanks a lot.

hexastrayer commented 7 months ago

@dongngm I did not use the code in Amphion for data preprocessing and dataset/loader. I used my own logic to provide relevant data needed in _train_step function. Maybe it would be easier for u to rewrite ns2_dataset.py based on this. I made a lot of changes locally, so it’s not easy to create PR.

HarryHe11 commented 7 months ago

Hi, @hexastrayer , If you have any further questions about Natural Speech2 Training Speed, feel free to re-open this issue. We are glad to follow up!

a897456 commented 6 months ago

Hi @hexastrayer Can you share the pre-training model? Training really takes too long time.

HeCheng0625 commented 6 months ago

@a897456 Hi, we have provided the pre-trained checkpoint https://huggingface.co/amphion/naturalspeech2_libritts