Closed thivux closed 4 months ago
From other people's feedback, you don't need to retrain encodec on your target language. Just finetuning the LLM is enough
thanks @jasonppy for the info. if i have a large dataset (~5k hours of audio) in my target language (Vietnamese), should i go with fine-tuning the LLM or retrain a new one?
You could try both
From: Thi Vũ @.> Sent: Monday, July 1, 2024 11:54:55 PM To: jasonppy/VoiceCraft @.> Cc: Puyuan Peng @.>; Mention @.> Subject: Re: [jasonppy/VoiceCraft] use facebook's pretrained encodec model (Issue #144)
thanks @jasonppyhttps://github.com/jasonppy for the info. if i have a large dataset (~5k hours of audio) in my target language (Vietnamese), should i go with fine-tuning the LLM or retrain a new one?
— Reply to this email directly, view it on GitHubhttps://github.com/jasonppy/VoiceCraft/issues/144#issuecomment-2201838070, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALMEZCKJJ6UKZKJWFR2FZWDZKIQA7AVCNFSM6AAAAABKGFHJW6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBRHAZTQMBXGA. You are receiving this because you were mentioned.Message ID: @.***>
thank you. closing the issue.
hi @thivux, curious about your results, does by only finetune the trained model with new language is enough? or train new model from scratch is preferable? thanks
@justinatbahasa hi, from my experience, finetuning the pretrained checkpoint with a smaller amount of data (400 hours vietnamese + 100 hours english) gave even better results than training from scratch (5k hours vietnamese)
cool! thanks
Hi @thivux @justinatbahasa ! Do you have any clue why training from scratch gives worse results than finetuning? Have you guys followed the same training scheme as the original paper when you trained from scratch (4 gpus LR=0.01 and had no "is nan, therefore skip this batch" issues)?
i am experimenting voicecraft with another language and i don't want to retrain a custom encodec model for that language. have anyone tried training voicecraft with 8-codebook encodec checkpoint from facebook? how is the result compared to using 4 codebooks?