suno-ai / bark

🔊 Text-Prompted Generative Audio Model
MIT License
36.23k stars 4.26k forks source link

Usage Instructions do not work #467

Open Eddcapone opened 1 year ago

Eddcapone commented 1 year ago

First I installed bark:

git clone https://github.com/suno-ai/bark
cd bark && pip install . 

Then inside of the bark folder I installed transformers:

pip install git+https://github.com/huggingface/transformers.git

Then I created this python script named test.py from inside the main folder (bark) as instructed:

grafik

from transformers import AutoProcessor, BarkModel

processor = AutoProcessor.from_pretrained("suno/bark")
model = BarkModel.from_pretrained("suno/bark")

voice_preset = "v2/en_speaker_6"

inputs = processor("Hello, my dog is cute", voice_preset=voice_preset)

audio_array = model.generate(**inputs)
audio_array = audio_array.cpu().numpy().squeeze()

Then I run it, but a console window opens for a split second and nothing happens.

This is what I get when I run the script with python:

$ python /a/AI/Text-To-Speech/suno/bark/test.py Traceback (most recent call last): File "A:\AI\Text-To-Speech\suno\bark\test.py", line 3, in processor = AutoProcessor.from_pretrained("suno/bark") File "C:\Users\Edd\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\auto\processing_auto.py", line 258, in from_pretrained config = AutoConfig.from_pretrained( File "C:\Users\Edd\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\auto\configuration_auto.py", line 1032, in from_pretrained config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs) File "C:\Users\Edd\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\configuration_utils.py", line 620, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs) File "C:\Users\Edd\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\configuration_utils.py", line 675, in _get_config_dict resolved_config_file = cached_file( File "C:\Users\Edd\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\utils\hub.py", line 400, in cached_file raise EnvironmentError( OSError: suno/bark does not appear to have a file named config.json. Checkout 'https://huggingface.co/suno/bark/None' for available files.

Eddcapone commented 1 year ago

I've figured out, that I have to navigate to the folder were my test.py script lays and then call python test.py, it will behave completly different than calling it while being in another directory.

It installs some packages but then it just outputs:

_Loading the tokenizer from the special_tokens_map.json and the added_tokens.json will be removed in transformers 5, it is kept for forward compatibility, but it is recommended to update your tokenizer_confi g.json by uploading it again. You will see the new added_tokens_decoder attribute that will store the relevant information. The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. Setting pad_token_id to eos_token_id:10000 for open-end generation._

Then I extended the script like instructed and installed ipython with pip install ipython. But if I call it, then I still get no audio output.

from transformers import AutoProcessor, BarkModel
from IPython.display import Audio

processor = AutoProcessor.from_pretrained("suno/bark")
model = BarkModel.from_pretrained("suno/bark")

voice_preset = "v2/en_speaker_6"

inputs = processor("Hello, my dog is cute", voice_preset=voice_preset)

audio_array = model.generate(**inputs)
audio_array = audio_array.cpu().numpy().squeeze()

sample_rate = model.generation_config.sample_rate
Audio(audio_array, rate=sample_rate)

I also tried this:

from transformers import AutoProcessor, BarkModel
from IPython.display import Audio
import scipy

processor = AutoProcessor.from_pretrained("suno/bark")
model = BarkModel.from_pretrained("suno/bark")

voice_preset = "v2/en_speaker_6"

inputs = processor("Hello, my dog is cute", voice_preset=voice_preset)

audio_array = model.generate(**inputs)
audio_array = audio_array.cpu().numpy().squeeze()

#sample_rate = model.generation_config.sample_rate
#Audio(audio_array, rate=sample_rate)

sample_rate = model.generation_config.sample_rate
scipy.io.wavfile.write("bark_out.wav", rate=sample_rate, data=audio_array)

But no output file is generated.

Please fix the instructions

Eddcapone commented 1 year ago

OK! So I figured it out. I had to move the test.py script into the folder "bark" which they apparently refer to as "main" folder. grafik

Then this script works:

from transformers import AutoProcessor, BarkModel
import scipy

processor = AutoProcessor.from_pretrained("suno/bark")
model = BarkModel.from_pretrained("suno/bark")

voice_preset = "v2/en_speaker_6"

inputs = processor("Hello, my dog is cute", voice_preset=voice_preset)

audio_array = model.generate(**inputs)
audio_array = audio_array.cpu().numpy().squeeze()

sample_rate = model.generation_config.sample_rate
scipy.io.wavfile.write("bark_out.wav", rate=sample_rate, data=audio_array)

It will take some time and there is no feedback, but after a while it will eventually generate the bark_out.wav in the same folder as the script.

This is the worst usage instructions I ever came across. smh...

CodeRippleDatabase commented 1 year ago

Did you manually downloaded the models and if so. Which folder did you add them into?

Eddcapone commented 1 year ago

@CodeRippleDatabase What models? I just did everything like described above.

JonathanFly commented 1 year ago

This is a bit confusing but those are two seperate Bark implementation. You can use either:

  1. HuggingFace Bark - from transformers import AutoProcessor, BarkModel
  2. Original Suno Bark - git clone https://github.com/suno-ai/bark

However to make things even more confusing, Suno bark requires HuggingFace transformers, which means you basically install both versions. But if you aren't using the Suno code you can skip the Suno part and just do the huggingface.

vitalkanxx commented 1 year ago

Good job!