Open kunibald413 opened 1 week ago
+1
@kunibald413 @mush42 Just added the inference code + checkpoints.
For single-speaker adaptation you might give a shot to initial-state tuning. See the notebook for an example on the Expresso dataset and the paper.
For fine-tuning/pre-training, I will upload my training code after cleaning it in the following days. I'll ping you when it's ready. Important note: the checkpoint of WavTokenizer I use do not generalize as well as EnCodec for instance. In my case it tends to behave poorly for French.
@kunibald413 @mush42 Just added the inference code + checkpoints.
For single-speaker adaptation you might give a shot to initial-state tuning. See the notebook for an example on the Expresso dataset and the paper.
For fine-tuning/pre-training, I will upload my training code after cleaning it in the following days. I'll ping you when it's ready. Important note: the checkpoint of WavTokenizer I use do not generalize as well as EnCodec for instance. In my case it tends to behave poorly for French.
Hello, what specifically do you mean by "not good"? How does the audio quality of EnCodec compare to WavTokenizer?
@kunibald413 @mush42 Just added the inference code + checkpoints. For single-speaker adaptation you might give a shot to initial-state tuning. See the notebook for an example on the Expresso dataset and the paper. For fine-tuning/pre-training, I will upload my training code after cleaning it in the following days. I'll ping you when it's ready. Important note: the checkpoint of WavTokenizer I use do not generalize as well as EnCodec for instance. In my case it tends to behave poorly for French.
Hello, what specifically do you mean by "not good"? How does the audio quality of EnCodec compare to WavTokenizer?
EnCodec is a general codec for speech, music, environment sounds etc ... WavTokenizer is a low-bitrate speech codec with clear bias towards english (at least). In my experiments, WavTokenizer does a way better job most of the times for training generative model for english speech, with some artifacts on certain voices or particular aspect such as laugh, and poor results on french for instance.
@theodorblackbird @ScottishFold007
I finetuned wavtokenizer for Indic languages and it gave great results. Finetuning for few epochs improves quality imho.
@theodorblackbird @ScottishFold007
I finetuned wavtokenizer for Indic languages and it gave great results. Finetuning for few epochs improves quality imho.
Hello, besides fine-tuning the wavtokenizer for Indic languages, have you also done incremental fine-tuning or pre-training on lina-speech?
@ScottishFold007 NO. Not yet!
Is there a way to finetune a single speaker with this repo, if so could you share the steps?
recent commit to the readme said this:
If you have time could you share what can be expected and in what timeframe?
Thank you for your time and effort!