suno-ai / bark

🔊 Text-Prompted Generative Audio Model
MIT License
35.96k stars 4.24k forks source link

Is there any documentation? :D #16

Closed yipy0005 closed 1 year ago

gkucsko commented 1 year ago

working on putting something more comprehensive together. anything in particular i can help with in the meantime?

robbyz512 commented 1 year ago

working on putting something more comprehensive together. anything in particular i can help with in the meantime?

UserWarning: No audio backend is available.
No GPU being used. Careful, Inference might be extremely slow!

Some documentation on how to add audio backend or select gpu. (it defaults to cpu)

avaer commented 1 year ago

It would be nice to know how to generate the semantic histories for other voices (in npz format):

https://github.com/suno-ai/bark/blob/5dc6a4dca2755da4fde37123a4845dc1895798f3/bark/generation.py#L352

arturh85 commented 1 year ago

UserWarning: No audio backend is available.

This warning can be fixed by installing ffmpeg and running:

pip install pysoundfile

and adding the following to the beginning of your python file:

import torchaudio
torchaudio.set_audio_backend("soundfile")
dellis23 commented 1 year ago

working on putting something more comprehensive together. anything in particular i can help with in the meantime?

UserWarning: No audio backend is available.
No GPU being used. Careful, Inference might be extremely slow!

Some documentation on how to add audio backend or select gpu. (it defaults to cpu)

+1 to this. I'm getting the same warning. I wasted a lot of money on a GPU I didn't need, and now I need to justify its use.

dellis23 commented 1 year ago

Uninstalling torch and reinstalling with cuda support using the command from https://pytorch.org/get-started/locally/ seems to have worked.

dahifi commented 1 year ago

I used the autodoc library to generate documentation using gpt:

https://github.com/dahifi/autodoc-ker/blob/a6bc02a4ca90ef8805cf487f437f81e2035f9119/indexes/suno-ai/bark/.autodoc/docs/markdown

padmalcom commented 1 year ago

After reinstalling torch with cuda support, a reinstall of chardet was required.

dahifi commented 1 year ago

I used the autodoc library to generate documentation using gpt:

https://github.com/dahifi/autodoc-ker/blob/a6bc02a4ca90ef8805cf487f437f81e2035f9119/indexes/suno-ai/bark/.autodoc/docs/markdown

I could PR this into the main branch, but it would need maintaining.

Tanzengeist commented 1 year ago

Any suggestions for getting the CPU running. The GPU isn't kicking in. It's an NVIDIA GeForce GTX 1050Ti with 4GB GDDR5 on my Dell XPS15 9570 laptop

Just installed the latest NVIDIA drivers

No GPU being used. Careful, inference might be extremely slow! No GPU being used. Careful, inference might be extremely slow! No GPU being used. Careful, inference might be extremely slow!

But it is working, albeit by the CPU. But I'm getting good sound files

import os os.environ['SUNO_USE_SMALL_MODELS']='True' import torchaudio torchaudio.set_audio_backend("soundfile")

from bark import SAMPLE_RATE, generate_audio, preload_models from IPython.display import Audio from scipy.io.wavfile import write as write_wav

download and load all models preload_models( text_use_small=True, coarse_use_small=True, fine_use_gpu=True, fine_use_small=True, )

generate audio from text text_prompt = """ Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe. """ audio_array = generate_audio(text_prompt)

write_wav("Bark_audio4.wav", SAMPLE_RATE, audio_array)

play text in notebook Audio(audio_array, rate=SAMPLE_RATE)

arturh85 commented 1 year ago

@Tanzengeist It doesnt have anything to do with the nvidia drivers, you need to install the CUDA runtime and pytorch in a compatible version, you can generate a command for that here: https://pytorch.org/get-started/locally/

Tanzengeist commented 1 year ago

Thanks. I ran this command: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

Then I checked to see if it worked: import torch x = torch.rand(5, 3) print(x) x = torch.cuda.is_available() print(x)

The console: tensor([[0.1728, 0.3632, 0.0381], [0.3811, 0.2539, 0.6598], [0.1317, 0.9184, 0.4265], [0.0814, 0.6951, 0.2648], [0.7667, 0.4997, 0.8019]]) False

False - it isn't picking it up but it is getting a random tensor(how is that possible?).
I thought maybe I needed something older for the 1050. I went to: https://discuss.pytorch.org/t/help-installing-pytorch-with-gtx-1050-ti/168328 and executed: pip install torch==1.7.0+cu92 torchvision==0.8.0+cu92 torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html Then added: 'import torch' to my code. But there is no explicit call to torch in the code so unless bark calls torch... I don't know.

Any other ideas would be appreciated. As you can tell I'm a newbie but very motivated and excited about Bark.

arturh85 commented 1 year ago

@Tanzengeist Did you also install the Cuda 11.7 runtime from https://developer.nvidia.com/cuda-11-7-0-download-archive The random tensor is part of your test code x = torch.rand(5, 3) print(x)

Tanzengeist commented 1 year ago

Thanks. So it's just using the CPU to generate the vectors.

I installed cuda-11.7 runtime. Prior to this I was using the Dell provided latest Nvidia driver. Apparently 11.7 is older because I got a warning.

After installing cuda 11.7, running the program that checks torch.cuda.isavailable() still produces false. Anything you can imagine I might have done prior to this that could be blocking the GPU connection?

Do you know if this 1050 ti is supported by cuda 11.8?

felipelalli commented 1 year ago

We need the API urgent! LOL

ksylvan commented 1 year ago

working on putting something more comprehensive together. anything in particular i can help with in the meantime?

@gkucsko A more comprehensive list of the sound effects like [sigh] and [laughter] for one thing, would be great. Seems to be a common question.

felipelalli commented 1 year ago

@gkucsko is it possible to simulate simultaneous conversation or interruption? Like in an interview when someone interrupts another person? Or try to speak at the same time?

gkucsko commented 1 year ago

definitely one of our pie in the sky goals at Suno :) there is nothing fundamental preventing it

gkucsko commented 1 year ago

working on putting something more comprehensive together. anything in particular i can help with in the meantime?

@gkucsko A more comprehensive list of the sound effects like [sigh] and [laughter] for one thing, would be great. Seems to be a common question.

probably best to have this in discord since there is no exhaustive list. technically anything could work since there is a smooth embedding

gkucsko commented 1 year ago

closing for inactivity, feel free to reopen if needed

ksylvan commented 1 year ago

closing for inactivity, feel free to reopen if needed

Not re-opening this, but I am wondering if there's any better documentation since this issue was opened.

gkucsko commented 1 year ago

haha, yeah there is now a tutorials folder with a bunch of notebooks that should give a much better insight into how to use bark. there is also now a discord channel where the community has helped each out greatly!