suno-ai / bark

🔊 Text-Prompted Generative Audio Model
MIT License
36.01k stars 4.24k forks source link

:baby: Guide "How to create custom models from scratch for dummies" :memo: #300

Closed adriens closed 1 year ago

adriens commented 1 year ago

I would like to learn how to create custom models (for example my own voice) but I'm lacking some documentation to achieve this. :pray:

YiQiu1984 commented 1 year ago

I also want to know

danielklk commented 1 year ago

I could made it using the GPT as assistant:

!sudo apt-get install openjdk-8-jdk !echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list !curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add - !sudo apt-get update && sudo apt-get install bazel

!sudo apt-get install python3.7 python3.7-dev python3.7-tk

!pip3 install virtualenv==16.7.8

!sudo apt-get install gcc-7 g++-7 !sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 60 --slave /usr/bin/g++ g++ /usr/bin/g++-7 !sudo update-alternatives --config gcc

!git clone https://github.com/suno-ai/bark.git

!apt-get install python

!apt-get install python-pip

!pip install git+https://github.com/suno-ai/bark.git

import os

os.environ["CUDA_VISIBLE_DEVICES"] = "0"

from IPython.display import Audio import nltk # we'll use this to split into sentences import numpy as np

from bark.generation import ( generate_text_semantic, preload_models, ) from bark.api import semantic_to_waveform from bark import generate_audio, SAMPLE_RATE

preload_models()

script = """ Here comes your script to be spoken. """.replace("\n", " ").strip()


import nltk

!pip install nltk import nltk nltk.download('punkt')

sentences = nltk.sent_tokenize(script)

GEN_TEMP = 0.6 SPEAKER = "v2/pt_speaker_3" silence = np.zeros(int(0.25 * SAMPLE_RATE)) # quarter second of silence

pieces = [] for sentence in sentences: semantic_tokens = generate_text_semantic( sentence, history_prompt=SPEAKER, temp=GEN_TEMP, min_eos_p=0.05, # this controls how likely the generation is to end )

audio_array = semantic_to_waveform(semantic_tokens, history_prompt=SPEAKER,)
pieces += [audio_array, silence. Copy()]

Go to sleep and come back 8 hours later, for each paragraph you want.

Audio(np.concatenate(pieces), rate=SAMPLE_RATE)

write_wav("bark_generation.wav", SAMPLE_RATE, audio_array)

Audio(audio_array, rate=SAMPLE_RATE)

gkucsko commented 1 year ago

if you mean just voice cloning with an existing model then we don't support that for now, but there are some forks where people have gone in that direction. as for training/finetuning there is a bunch of chatter happening on discord with people working on that

adriens commented 1 year ago

@danielklk , what I meant was to learn how to clone or create a new voice :smile_cat:

adriens commented 1 year ago

if you mean just voice cloning with an existing model then we don't support that for now, but there are some forks where people have gone in that direction. as for training/finetuning there is a bunch of chatter happening on discord with people working on that

Yes @gkucsko , thatt's what I meant.

So if I undesrtsand well, we cannot really clone a real life voice, but rather modifying and fine tuning exosting voices. I've read some chats on Dicord but could not find a central place for guidelines. I found some really nice sounding voices made by community :open_mouth:

mandeep511 commented 7 months ago

if you mean just voice cloning with an existing model then we don't support that for now, but there are some forks where people have gone in that direction. as for training/finetuning there is a bunch of chatter happening on discord with people working on that

Yes @gkucsko , thatt's what I meant.

So if I undesrtsand well, we cannot really clone a real life voice, but rather modifying and fine tuning exosting voices. I've read some chats on Dicord but could not find a central place for guidelines. I found some really nice sounding voices made by community 😮

Could you share them please?