serp-ai / bark-with-voice-clone

🔊 Text-prompted Generative Audio Model - With the ability to clone voices
https://serp.ai/tools/bark-text-to-speech-ai-voice-clone-app
Other
3.16k stars 426 forks source link

Training process and loss understanding #81

Open SanketDhuri opened 1 year ago

SanketDhuri commented 1 year ago
  1. We have dataset of 5000-6000 audios of duration >= 2sec but still on training on this dataset for 15 epochs it doesn’t reaches satisfactory loss what is the issue here ?

eg last observed loss are : Semantic (train loss = 0.00309 ) (val loss = 1.28167 ) Coarse loss (train loss = 0.057) (val loss = 3.2796 ) Fine loss (train loss = 0.1 ) (val loss = 1.18 )

  1. Why does sometimes it itself adds or skips the word, sometime it also generates voice that has shivering tone?
  2. Why it gives problems for punctuation like ! or .
  3. We are training with mixed dataset different speaker but they have same way speaking can it create problem in training?
  4. On finetuning pretrained prompts like [laughs],etc are lost what might be the reason? We are preparing dataset’s with help of whisper model so should we also add those manually add those prompts to dataset?
  5. How can add new emotion prompts such sad or excited or unhappy etc.
dagshub[bot] commented 1 year ago

Join the discussion on DagsHub!

boringtaskai commented 6 months ago

Hi how do you use the weight after the training? cause I had issue like this:

raise ValueError(f"missing keys: {missing_keys}")