Open sourabharsh opened 9 months ago
FYI @ylacombe
Hey @sourabharsh, I'd be happy to help you understand how to do that. Have you checked the difference between the input history_prompt
that you get with the processor and the output history_prompt
that you get with your code ?
Feature request
Hi, I have noticed that the original implementation of Bark (https://github.com/suno-ai/bark) has added a feature where one can get the history_prompt for the audio being currently generated using the parameter output_full. history_prompt, out_arr = generate_audio(text_prompt, output_full=True) where history_prompt is a dict object with semantic_prompt, coarse_prompt, and fine_prompt as its keys.
But the generate method of the huggingface version of Bark (BarkModel) doesn't support this parameter. I tried to modify the code by creating a dict of these under the generate method but the prompts in the output prompt don't meet the criteria of a valid history_prompt to be used next time because of the mismatch in ndarray.
Even the ndarray shape is also different for semantic, coarse, and fine prompts are different in the original implementation and the HuggingFace implementation.
Can you please help me in fixing it?
Motivation
I want to generate a continous long-form audio for an audiobook for a better experience. I believe this will help in helping the Suno/Bark decide the tone based on the last sentence which can not be achieved using it at a sentence level based on a single fixed history_prompt.
Your contribution
I need to go through and understand why there is a difference in the shape of different prompts. If that's achieved, I can contribute with a PR.