suno-ai / bark

🔊 Text-Prompted Generative Audio Model
MIT License
33.76k stars 4k forks source link

Technical guide and explanation to Bark #477

Open kennethleungty opened 8 months ago

kennethleungty commented 8 months ago

Hi all, I just wrote a detailed article on the technical aspects within Bark. Feel free to check out the article here: https://betterprogramming.pub/text-to-audio-generation-with-bark-clearly-explained-4ee300a3713a?sk=e2b2f75f5fc93c656bef031c60bf99bf

JonathanFly commented 8 months ago

Nice writeup. I would just note that

Bark has various voices for speech generation based on language, gender, and background sounds. The complete voice collection with more than 100 presets can be found in the speaker library.

Is not the complete collection of voice. Even without add-ons for voice cloning you can simply not specify a voice in Bark and Bark will generate a new random voice on the spot. That voice can be saved and used again -- the voice is simply the audio sample itself, in raw tokens. So Bark is not limited to 100 voices, it's infinite voices. And because Bark tries to match the text prompt to a voice you can use Bark like a voice lab just by text prompting, creating and then refining/tweaking the voices with additional prompts/resaves.

kennethleungty commented 8 months ago

Good point there, I have made the edits. Thanks for highlighting this

platform-kit commented 8 months ago

@kennethleungty can you do a writeup on serpai's voice cloning addon? I would love to understand on a technical level how to train new voices at a high enough qualtiy that matches Bark's voice library.

boringtaskai commented 6 months ago

@kennethleungty can you do a writeup on serpai's voice cloning addon? I would love to understand on a technical level how to train new voices at a high enough qualtiy that matches Bark's voice library.

Yes me too, and also train the model with a new language