theodorblackbird / lina-speech

Official implementation of the TTS model Lina-Speech
Other
138 stars 12 forks source link

Finetune-single speaker instruction #10

Open kunibald413 opened 3 weeks ago

kunibald413 commented 3 weeks ago

Is there a way to finetune a single speaker with this repo, if so could you share the steps?

recent commit to the readme said this:

Code and checkpoints incoming...

If you have time could you share what can be expected and in what timeframe?

Thank you for your time and effort!

mush42 commented 3 weeks ago

+1

theodorblackbird commented 2 weeks ago

@kunibald413 @mush42 Just added the inference code + checkpoints.

For single-speaker adaptation you might give a shot to initial-state tuning. See the notebook for an example on the Expresso dataset and the paper.

For fine-tuning/pre-training, I will upload my training code after cleaning it in the following days. I'll ping you when it's ready. Important note: the checkpoint of WavTokenizer I use do not generalize as well as EnCodec for instance. In my case it tends to behave poorly for French.

ScottishFold007 commented 2 weeks ago

@kunibald413 @mush42 Just added the inference code + checkpoints.

For single-speaker adaptation you might give a shot to initial-state tuning. See the notebook for an example on the Expresso dataset and the paper.

For fine-tuning/pre-training, I will upload my training code after cleaning it in the following days. I'll ping you when it's ready. Important note: the checkpoint of WavTokenizer I use do not generalize as well as EnCodec for instance. In my case it tends to behave poorly for French.

Hello, what specifically do you mean by "not good"? How does the audio quality of EnCodec compare to WavTokenizer?

theodorblackbird commented 2 weeks ago

@kunibald413 @mush42 Just added the inference code + checkpoints. For single-speaker adaptation you might give a shot to initial-state tuning. See the notebook for an example on the Expresso dataset and the paper. For fine-tuning/pre-training, I will upload my training code after cleaning it in the following days. I'll ping you when it's ready. Important note: the checkpoint of WavTokenizer I use do not generalize as well as EnCodec for instance. In my case it tends to behave poorly for French.

Hello, what specifically do you mean by "not good"? How does the audio quality of EnCodec compare to WavTokenizer?

EnCodec is a general codec for speech, music, environment sounds etc ... WavTokenizer is a low-bitrate speech codec with clear bias towards english (at least). In my experiments, WavTokenizer does a way better job most of the times for training generative model for english speech, with some artifacts on certain voices or particular aspect such as laugh, and poor results on french for instance.

mush42 commented 2 weeks ago

@theodorblackbird @ScottishFold007

I finetuned wavtokenizer for Indic languages and it gave great results. Finetuning for few epochs improves quality imho.

ScottishFold007 commented 2 weeks ago

@theodorblackbird @ScottishFold007

I finetuned wavtokenizer for Indic languages and it gave great results. Finetuning for few epochs improves quality imho.

Hello, besides fine-tuning the wavtokenizer for Indic languages, have you also done incremental fine-tuning or pre-training on lina-speech?

mush42 commented 2 weeks ago

@ScottishFold007 NO. Not yet!