jasonppy / VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild
Other
7.66k stars 749 forks source link

Is very limited data possible for finetuning the model? #91

Open qingju-flwls opened 7 months ago

qingju-flwls commented 7 months ago

Thanks for the new finetuning script. I have compared the finetuning and training scripts, and found they are basically the same except a few hyperparameter difference in, e.g. learning rate, optimimisers.

The voicecraft training requires large dataset thousands hours of data to train from scrach. Just wondering, how much data is recommended for the finetuning process to obtain some decent results?

Since the model size is so big, I am curious, does it make sense to finetune on very limited data? e.g. one person's half an hour recordings? Has some test been done with partial finetuning of a few layers rather than full-finetuning?

Thank you.

jasonppy commented 7 months ago

Thanks!

I'm not sure about this, partial finetuning or lora sounds good to me. But I think one needs to actually run experiments to get an answer.

qingju-flwls commented 7 months ago

Right. Thanks for your reply. What is the amount of data do you recommend to fine-tune VoiceCraft with full-finetuning? For instance, if I want to adapt the model to, let's say, Chinese-accented English. How many hours data roughly do you think is needed? Thank you.

jasonppy commented 7 months ago

I'm really not sure haha. keep me updated