heatz123 / naturalspeech

A fully working pytorch implementation of NaturalSpeech (Tan et al., 2022)
470 stars 68 forks source link

How many time cost ? #6

Open yiwei0730 opened 1 year ago

yiwei0730 commented 1 year ago

Thank you for your great work! I want to know about the time of the two works( without soft-dtw 1.5k step and the training with soft-dtw 15k step). Have you ever try delete some parameters and the performance? Is this model have Model Parallel or just train on one GPU?

heatz123 commented 1 year ago

Hi @yiwei0730! It took for me about 3 days for 100k iterations (~1k epochs), using batch size 16 with an 8 p40 GPU cluster, for the non-sdtw version. The sdtw version was similar in training times, but since I had to reduce the batch size to 4, I didn't train this to the end. I didn't play with model parameters for now, as I used the same hyperparameters listed in the paper. It would be great to find the optimal balance between parameters and performance. Hope this helps. Thank you.

fmac2000 commented 1 year ago

@heatz123 Hi heats, I have access to 8 40gb A100s on GCP in a single zone if you don't have the hardware requirements- I can request access for 8x80gb A100s but I can't guarantee anything. But considering the quality of the first 1600 steps that you released, maybe it would serve the community better to release multiple voices at a lower step count rather than one fully trained voice?

Let me know if you're interested :) Great work, you are a star!

heatz123 commented 1 year ago

Hi @fmac2000!

Yes, if the GPU cluster is available, it would be a huge help for releasing the pretrained weights.

As for training with multiple voices, that would be certainly helpful to the community. Thank you for your suggestion. But currently I have some other works to do, so I wouldn't be able to get enough time to find and preprocess other datasets. I'll consider training with different voices and languages maybe after this month.

If it's okay, could you please contact me at [heatz123@snu.ac.kr](mailto:heatz123@snu.ac.kr) or provide me with your email so we can discuss the details further?

Thank you again for your support!

teopapad92 commented 1 year ago

Hey guys,

First of all @heatz123 thanks for sharing , nice work!

I am also interested in replicating these results and also train the model for the whole 15k epochs.

Did you ever go through with it? Did it work? It would be nice to have some feedback before I commit to such an effort.

ktjayamanna commented 7 months ago

@heatz123 How much data did you have in your training set?