Open qingju-flwls opened 7 months ago
Thanks!
I'm not sure about this, partial finetuning or lora sounds good to me. But I think one needs to actually run experiments to get an answer.
Right. Thanks for your reply. What is the amount of data do you recommend to fine-tune VoiceCraft with full-finetuning? For instance, if I want to adapt the model to, let's say, Chinese-accented English. How many hours data roughly do you think is needed? Thank you.
I'm really not sure haha. keep me updated
Thanks for the new finetuning script. I have compared the finetuning and training scripts, and found they are basically the same except a few hyperparameter difference in, e.g. learning rate, optimimisers.
The voicecraft training requires large dataset thousands hours of data to train from scrach. Just wondering, how much data is recommended for the finetuning process to obtain some decent results?
Since the model size is so big, I am curious, does it make sense to finetune on very limited data? e.g. one person's half an hour recordings? Has some test been done with partial finetuning of a few layers rather than full-finetuning?
Thank you.