Open RageshAntony opened 1 year ago
This is easy. Open you generation.py file. You will see it is expecting the file to be named with a language prename, and underscore, and then and n number. you need to later the range of the array to go from 1 to 10 to go from 1 to 11. and then you need to name your new npz file as en_speaker_10.npz or 11 if you have more etc and adjust the range of the array proper. You aren't telling your script where the npz file is, is your issue.
Here is example of my code change is the only change needed where it says range(11) and then name your file right. or if you want to call it something else, then code that here.
starting at line 74 in my version at least of generation.py
ALLOWEDPROMPTS = {"announcer"} for , lang in SUPPORTED_LANGS: for prefix in ("", f"v2{os.path.sep}"): for n in range(11): ALLOWED_PROMPTS.add(f"{prefix}{lang}speaker{n}")
Hey @RageshAntony!
Sorry I don't have answer for you (I just started exploring this repo today). I wanted to know what's the code behind https://huggingface.co/spaces/fffiloni/clone-voice-for-bark?
Best, Rahul
also, i tried that npz generator, I have not been able to produce a working npz with it. different errors every time... it generates the npz but they don't work...
a little update. i followed the api in that link to this endpoint https://fffiloni-clone-voice-for-bark.hf.space/ and it did generate a working npz, and it took a lot longer. I think the other one isn't passing the audio file.?? But it still didn't work work. It was very garbled at beginning, and then still sounded like voice 6 but deeper
and it did generate a working npz, and it took a lot longer. I think the other one isn't passing the audio file.??
How did you use it? I renamed as "en_speaker_10.npz" and loaded as "v2/en_speaker_10", but still get "history prompt not found" error
I renamed as "en_speaker_10.npz" and loaded as "v2/en_speaker_10", but still get "history prompt not found" error
Same issue with me.
Ahh, the issue is in line 71 to 75 in generation.py. The ALLOWED_PROMPTS
set variable restricts the name of speakers so ours is not included in it and ValueError("history prompt not found")
is being raised.
The voice cloning is not working. :(
@RahulBhalley I deleted that IF block.But now get assertion error
Maybe some issue with Generated NPZ or bark not supporting it
u two didn't listen to a word I said. You need to edit the generation.py to allow the array to goto 11 ffs otherwise rename your npz file as number 9 or 8 if you don't know how to edit an array. But you shouldn't be editing this py file if you don't know how to read an array..
This was the best result I could get https://github.com/suno-ai/bark/assets/17352697/53e6844d-4931-48ae-94b4-5303a5d5327a
it played a lot of music and was not the voice it was suppose to be even when prompted to be male so.. i dunno
howveer, that does mean that npz generator is working, I suspect it is just very picky, ie, you need to speak more clear, remove background noise, and maybe say a specific phrase instead of a generic random one
if you want to follow my progress on this I've been actively working on this for a bit now :P
ok, i got a better result this time. but it still makes a lot of music noises and it was my voice I used to make the npz file and I prompted bark with [man] in the text and it still sound like voice 6 female... So I dunno... https://github.com/suno-ai/bark/assets/17352697/bf6ab70e-4f65-4bcf-a027-f468b3729514
@cybershrapnel I already did that
added till 15
ALLOWED_PROMPTS = {"announcer"}
for _, lang in SUPPORTED_LANGS:
for prefix in ("", f"v2{os.path.sep}"):
for n in range(15):
ALLOWED_PROMPTS.add(f"{prefix}{lang}_speaker_{n}")
print(ALLOWED_PROMPTS)
output includes the v2/en_speaker_10 also en_speaker_10.npz.zip
But still I got "History prompt not found"
Then only I removed the IF block that throws the error
Then I got assertion error
I attached the NPZ file for reference
cc: @RahulBhalley
thats wrong
u dont need to edit your generation.py file if you don't understand, set it back the way it was, rename ur file as en_speaker_9.npz do not change the generation.py and call in script like this
SAMPLE_RATE = 22050 HISTORY_PROMPT = "en_speaker_9" SPEAKER=HISTORY_PROMPT
or if you really want it in the v2 folder
SAMPLE_RATE = 22050 HISTORY_PROMPT = "v2/en_speaker_9" SPEAKER=HISTORY_PROMPT
the v2 thing is not important. thats the directory
If you don’t care about using other speakers already present, simply use any of those names.
My issue was that voice didn’t sound like it should have.
same, the npz is not correct or the models don't support it, not sure which
I think the correct npz file is being loaded. And the model must support every voice if it’s trained on humongous dataset like VALL-E.
But I’m doubtful about the way the latent features from voice are extracted. Maybe that part has some issue. It’s not able to fetch the timbre information from my voice.
Furthermore, sometimes same speaker sounds different. The team should give some argument to control the randomness of every inference like HuggingFace gives for Stable Diffusion (that generator
argument). Bark won’t be useful if the speaking style and voice will always vary across every inference (i.e. if it’ll be unpredictable every time).
agreed, but Ive heard examples with bark using other voices. so... Does anyone have any example npz files we can play with?
@cybershrapnel
Did the same. But still getting Assertion error
but Ive heard examples with bark using other voices.
Could you please give me some links? I straightaway started generating speech instead of looking at other’s generated speeches.
why do you keep using v2 in the path. Im not trying to be a jerk, but you clearly dont understand very basic coding concepts. Stop trying to call it v2, or if you are gonna call it v2, put the file in the v2 folder. You are having nothing but a path issue, which we can't help you with. Path issues are aq very basic coding principle. You need to learn you basics on paths before you go any further. You are only having an issue with the file path. thats it. Nothing else is going except you are pathing your file wrong.
@cybershrapnel Well. I have 7 years of coding experience. Let me tell you in detail what I did
v2 npzs are different i think woudl explain your error. that generator clearly makes v1 npz
@cybershrapnel May be. Let me check it
@cybershrapnel I replaced the "en_speaker_9" inside the prompts folder
Still get assertation error
let me check with the clone_voice.ipynb notebook
you need to reinstall, u cleary messed up something if thats not working, because it still sounds like ur having a path issue, i tried it under both normal and v2 and it worked
also, keep in mind there have been serious changes to this repo lately, and I think they broke it, I backed up to an older version, hence the memory issues the new versions introduces on 8g and lower cards. i think its due to the increased speed in inference but not sure. when i use the current version, get a lot of garbled audio. old version is almost perfect but very slow
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") 100%|██████████| 92/92 [00:00<00:00, 112.01it/s] Traceback (most recent call last):
File "C:\Python310\lib\site-packages\bark\api.py", line 113, in generate_audio out = semantic_to_waveform( File "C:\Python310\lib\site-packages\bark\api.py", line 54, in semantic_to_waveform coarse_tokens = generate_coarse( File "C:\Python310\lib\site-packages\bark\generation.py", line 571, in generate_coarse round(x_coarse_history.shape[-1] / len(x_semantic_history), 1) AssertionError
Process finished with exit code 1 I follow the topic to clone my voice, then I hit this error, anyone know how to fix it ?
Regarding loading the npz you must pass the full path with the .npz
extension like:
from bark import SAMPLE_RATE, generate_audio, preload_models
from scipy.io.wavfile import write as write_wav
preload_models()
prompt = "Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe."
history_prompt = "/path/to/history_prompt.npz"
audio_array = generate_audio(prompt, history_prompt=history_prompt)
write_wav("/path/to/audio.wav", SAMPLE_RATE, audio_array)
About the other error, your npz is not correct and you should seek help where you created that npz.
Ps.: Looking at the link you provided they use a technique I tried before but it don't work, as far as I know there is no reliable method to really clone your voice. I tried the hubert based method mentioned below and it works fine.
You're probably using the old cloning tech which produced invalid .npz files very often. Use the new hubert based methods. Most popular Bark UIs have it built in (including mine) and the original repo is https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer
ok, i got a better result this time. but it still makes a lot of music noises and it was my voice I used to make the npz file and I prompted bark with [man] in the text and it still sound like voice 6 female... So I dunno... https://github.com/suno-ai/bark/assets/17352697/bf6ab70e-4f65-4bcf-a027-f468b3729514
Craft a voice with text prompts is generally done with a random voice, and then saving the bark output as a new .npz file. If you're cloning the text prompt isn't going to shape the text prompt much. It does shape it somewhat, so you can save the sample again and make new version. For example here's a variant of v2/en_spraker_3 I modified to speak faster. But that's a lot more fiddly. v2_en_speaker_3_double_expresso.zip
https://github.com/suno-ai/bark/assets/163408/da4f0571-0954-4481-b121-87db20f5fbd8
As an example of voice crafting try using a random voice (no history_prompt, no .npz file) with this prompt:
Listen to my soothing, relaxing voice. Breathe calmly in, and out. Slowly close your eyes. Continue to breathe at this slow pace. Feel the air expand your lungs with each in breath.
You'll get a very high percentage of slow calm female voices.
Hi guys, I am using bark with hugging face transfromers API and when I tried to call history_prompt I got an error "TypeError: string indices must be integers, not 'str'". It seems that transformers API expect a dictionary like type and the value must be torch.tensor. Any clues to solve the problem?
Hi guys, I am using bark with hugging face transfromers API and when I tried to call history_prompt I got an error "TypeError: string indices must be integers, not 'str'". It seems that transformers API expect a dictionary like type and the value must be torch.tensor. Any clues to solve the problem?
@JeavanCode just pass the path of the npz file, but it usually complains about custom npz files that work fine with the original bark. Looks like if the npz is not exactly in the format they need it does nothing to crop the data like the original bark does.
from scipy.io import wavfile
from transformers import AutoProcessor, BarkModel
processor = AutoProcessor.from_pretrained("suno/bark-small")
model = BarkModel.from_pretrained("suno/bark-small")
voice_preset = "/path/to/history_prompt.npz"
inputs = processor("Hello, my dog is cute, I need him in my life", voice_preset=voice_preset)
audio_array = model.generate(**inputs, semantic_max_new_tokens=100)
audio_array = audio_array.cpu().numpy().squeeze()
sample_rate = model.generation_config.sample_rate
wavfile.write(f"/path/to/audio.wav", sample_rate, audio_array)
Hi guys, I am using bark with hugging face transfromers API and when I tried to call history_prompt I got an error "TypeError: string indices must be integers, not 'str'". It seems that transformers API expect a dictionary like type and the value must be torch.tensor. Any clues to solve the problem?
@JeavanCode just pass the path of the npz file, but it usually complains about custom npz files that work fine with the original bark. Looks like if the npz is not exactly in the format they need it does nothing to crop the data like the original bark does.
from scipy.io import wavfile from transformers import AutoProcessor, BarkModel processor = AutoProcessor.from_pretrained("suno/bark-small") model = BarkModel.from_pretrained("suno/bark-small") voice_preset = "/path/to/history_prompt.npz" inputs = processor("Hello, my dog is cute, I need him in my life", voice_preset=voice_preset) audio_array = model.generate(**inputs, semantic_max_new_tokens=100) audio_array = audio_array.cpu().numpy().squeeze() sample_rate = model.generation_config.sample_rate wavfile.write(f"/path/to/audio.wav", sample_rate, audio_array)
Thanks! I didin't know I need to pass voice_preset to voice_preset instead of pass history_prompt to model.generate. BTW, how do you figure it out, is there a document or handbook or sth. ? I always get confused when calling APIs like _BarkModel.frompretrained("suno/bark-small"), I don't understand how to traceback code like this.
Thanks! I didin't know I need to pass voice_preset to voice_preset instead of pass history_prompt to model.generate. BTW, how do you figure it out, is there a document or handbook or sth. ? I always get confused when calling APIs like _BarkModel.frompretrained("suno/bark-small"), I don't understand how to traceback code like this.
Documentation https://huggingface.co/docs/transformers/model_doc/bark and source code https://github.com/huggingface/transformers
Hi, I created npz file with italian clone voice, but it's not good with italian language. I need to create a new hubert base model and after I will train audio?
Hi, I created npz file with italian clone voice, but it's not good with italian language. I need to create a new hubert base model and after I will train audio?
Yes, to clone Italian you need a hubert model specific for Italian.
How can get a guide to train hubert base model?
Il ven 3 nov 2023, 15:51 Jairo Correa @.***> ha scritto:
Hi, I created npz file with italian clone voice, but it's not good with italian language. I need to create a new hubert base model and after I will train audio?
Yes, to clone Italian you need a hubert model specific for Italian.
— Reply to this email directly, view it on GitHub https://github.com/suno-ai/bark/issues/379#issuecomment-1792585792, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE43GZTVANBW2QC5XGR5HNDYCUAITAVCNFSM6AAAAAAZ7A3AYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJSGU4DKNZZGI . You are receiving this because you commented.Message ID: @.***>
How can get a guide to train hubert base model?
https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer#how-do-i-train-it-myself
I already do It, but base model it's in english. I mean, how can create base hubert in italian for training other speaker.
Il ven 3 nov 2023, 20:33 Jairo Correa @.***> ha scritto:
How can get a guide to train hubert base model?
https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer#how-do-i-train-it-myself
— Reply to this email directly, view it on GitHub https://github.com/suno-ai/bark/issues/379#issuecomment-1792991390, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE43GZWMVMLRWDLE3F6FDBDYCVBHNAVCNFSM6AAAAAAZ7A3AYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJSHE4TCMZZGA . You are receiving this because you commented.Message ID: @.***>
I already do It, but base model it's in english. I mean, how can create base hubert in italian for training other speaker.
Read the How do I train it myself? it explains how to create a new model in any language you want.
Repeat. I do It but it's not good with italian, because base pth it's in english
Il ven 3 nov 2023, 22:45 Jairo Correa @.***> ha scritto:
I already do It, but base model it's in english. I mean, how can create base hubert in italian for training other speaker.
Read the How do I train it myself? https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer#how-do-i-train-it-myself it explains how to create a new model in any language you want.
— Reply to this email directly, view it on GitHub https://github.com/suno-ai/bark/issues/379#issuecomment-1793139545, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE43GZTSG7NUSM62ZPJ32IDYCVQWTAVCNFSM6AAAAAAZ7A3AYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTGEZTSNJUGU . You are receiving this because you commented.Message ID: @.***>
@Maverick1983 Looks like you are having trouble finding it so I will copy and paste it here
https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer#how-do-i-train-it-myself
Simply run the training commands.
A simple way to create semantic data and wavs for training, is with my script: bark-data-gen. But remember that the creation of the wavs will take around the same time if not longer than the creation of the semantics. This can take a while to generate because of that.
For example, if you have a dataset with zips containing audio files, one zip for semantics, and one for the wav files. Inside of a folder called "Literature"
You should run process.py --path Literature --mode prepare
for extracting all the data to one directory
You should run process.py --path Literature --mode prepare2
for creating HuBERT semantic vectors, ready for training
You should run process.py --path Literature --mode train
for training
And when your model has trained enough, you can run process.py --path Literature --mode test
to test the latest model.
To create the dataset use this repository as example but CHANGE THE BOOKS TO ITALIAN BOOKS so it works with ITALIAN
https://github.com/gitmylo/bark-data-gen
After you do all this things you will have a PTH file for ITALIAN
I already do It... But not speak good italian.
Il dom 5 nov 2023, 03:19 Jairo Correa @.***> ha scritto:
@Maverick1983 https://github.com/Maverick1983 Looks like you are having trouble finding it so I will copy and paste it here
https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer#how-do-i-train-it-myself
How do I train it myself?
Simply run the training commands.
A simple way to create semantic data and wavs for training, is with my script: bark-data-gen https://github.com/gitmylo/bark-data-gen. But remember that the creation of the wavs will take around the same time if not longer than the creation of the semantics. This can take a while to generate because of that.
For example, if you have a dataset with zips containing audio files, one zip for semantics, and one for the wav files. Inside of a folder called "Literature"
You should run process.py --path Literature --mode prepare for extracting all the data to one directory
You should run process.py --path Literature --mode prepare2 for creating HuBERT semantic vectors, ready for training
You should run process.py --path Literature --mode train for training
And when your model has trained enough, you can run process.py --path Literature --mode test to test the latest model.
To create the dataset use this repository as example but CHANGE THE BOOKS TO ITALIAN BOOKS so it works with ITALIAN
https://github.com/gitmylo/bark-data-gen
After you do all this things you will have a PTH file for ITALIAN
— Reply to this email directly, view it on GitHub https://github.com/suno-ai/bark/issues/379#issuecomment-1793611154, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE43GZWOJQ77DYQI55PVHXLYC3ZUVAVCNFSM6AAAAAAZ7A3AYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTGYYTCMJVGQ . You are receiving this because you were mentioned.Message ID: @.***>
I created a NPZ file via this site https://huggingface.co/spaces/fffiloni/clone-voice-for-bark
Then I put it in the /assets/prompts/v2/ as ragesh.npz
Then I loaded it like this
audio_array = generate_audio(text_prompt, history_prompt="v2/ragesh")
But I get
ValueError: history prompt not found
Then I tired like audio_array = generate_audio(text_prompt, history_prompt="/path/to/../v2/ragesh") and still the same error
Then I tried like audio_array = generate_audio(text_prompt, history_prompt="/path/to/../v2/ragesh.npz")
Then I got
Please help me to load NPZ file of my voice