huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.36k stars 26.87k forks source link

Returning history prompt from BarkModel.generate() #28890

Open sourabharsh opened 9 months ago

sourabharsh commented 9 months ago

Feature request

Hi, I have noticed that the original implementation of Bark (https://github.com/suno-ai/bark) has added a feature where one can get the history_prompt for the audio being currently generated using the parameter output_full. history_prompt, out_arr = generate_audio(text_prompt, output_full=True) where history_prompt is a dict object with semantic_prompt, coarse_prompt, and fine_prompt as its keys.

But the generate method of the huggingface version of Bark (BarkModel) doesn't support this parameter. I tried to modify the code by creating a dict of these under the generate method but the prompts in the output prompt don't meet the criteria of a valid history_prompt to be used next time because of the mismatch in ndarray.

Even the ndarray shape is also different for semantic, coarse, and fine prompts are different in the original implementation and the HuggingFace implementation.

Can you please help me in fixing it?

Motivation

I want to generate a continous long-form audio for an audiobook for a better experience. I believe this will help in helping the Suno/Bark decide the tone based on the last sentence which can not be achieved using it at a sentence level based on a single fixed history_prompt.

Your contribution

I need to go through and understand why there is a difference in the shape of different prompts. If that's achieved, I can contribute with a PR.

ArthurZucker commented 9 months ago

FYI @ylacombe

ylacombe commented 8 months ago

Hey @sourabharsh, I'd be happy to help you understand how to do that. Have you checked the difference between the input history_prompt that you get with the processor and the output history_prompt that you get with your code ?