from bark import SAMPLE_RATE, generate_audio, preload_models
import sounddevice
from transformers import BarkModel, BarkProcessor
import torch
import numpy as np
from optimum.bettertransformer import BetterTransformer
from scipy.io.wavfile import write as write_wav
import re
device = "cuda:0" if torch.cuda.is_available() else "cpu"
SPEAKER = "v2/en_speaker_6"
error message:
F:\Program Files\anaconda3\envs\ollamaRAG\Lib\site-packages\transformers\models\encodec\modeling_encodec.py:124: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor).
self.register_buffer("padding_total", torch.tensor(kernel_size - stride, dtype=torch.int64), persistent=False)
The class optimum.bettertransformers.transformation.BetterTransformer is deprecated and will be removed in a future release.
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:None for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:None for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:None for open-end generation.
I used this code to generate some audio. But I found the voice_preset didn't work. The audio consists of difference voices. In the call of Barkprocessor, it was fine. But in the generate history_prompt was all 10000,
from bark import SAMPLE_RATE, generate_audio, preload_models import sounddevice from transformers import BarkModel, BarkProcessor import torch import numpy as np from optimum.bettertransformer import BetterTransformer from scipy.io.wavfile import write as write_wav import re
device = "cuda:0" if torch.cuda.is_available() else "cpu" SPEAKER = "v2/en_speaker_6"
def barkspeed(text_prompt): processor = BarkProcessor.from_pretrained("suno/bark-small") model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float16).to(device) model = BetterTransformer.transform(model, keep_original_model=False) model.enable_cpu_offload() sentences = re.split(r'[.?!]', text_prompt) pieces = [] for sentence in sentences: inp = processor(sentence.strip(), voice_preset=SPEAKER).to(device) audio = model.generate(*inp, do_sample=True, fine_temperature=0.4, coarse_temperature=0.5) audio = ((audio/torch.max(torch.abs(audio))).numpy(force=True).squeeze()pow(2, 15)).astype(np.int16) pieces.append(audio) write_wav("bark_generation.wav", SAMPLE_RATE, np.concatenate(pieces)) sounddevice.play(np.concatenate(pieces), samplerate=24000) sounddevice.wait()
error message: F:\Program Files\anaconda3\envs\ollamaRAG\Lib\site-packages\transformers\models\encodec\modeling_encodec.py:124: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). self.register_buffer("padding_total", torch.tensor(kernel_size - stride, dtype=torch.int64), persistent=False) The class
optimum.bettertransformers.transformation.BetterTransformer
is deprecated and will be removed in a future release. The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details. The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input'sattention_mask
to obtain reliable results. Settingpad_token_id
toeos_token_id
:None for open-end generation. The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input'sattention_mask
to obtain reliable results. The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input'sattention_mask
to obtain reliable results. Settingpad_token_id
toeos_token_id
:None for open-end generation. The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input'sattention_mask
to obtain reliable results. Settingpad_token_id
toeos_token_id
:None for open-end generation. The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input'sattention_mask
to obtain reliable results. Settingpad_token_id
toeos_token_id
:None for open-end generation.I used this code to generate some audio. But I found the voice_preset didn't work. The audio consists of difference voices. In the call of Barkprocessor, it was fine. But in the generate history_prompt was all 10000,