Open BirgitPohl opened 2 weeks ago
@BirgitPohl hey! I tried to run the provided script and it runs a few seconds on a GPU and also stuck for CPU. I think it's not stuck forever but simply taking more time for CPU as generating audio requires more tokens than text , which is expected :)
@BirgitPohl hey! I tried to run the provided script and it runs a few seconds on a GPU and also stuck for CPU. I think it's not stuck forever but simply taking more time for CPU as generating audio requires more tokens than text , which is expected :)
That is good, it worked for you. :)
How much time did you have using a CPU until you got an output? How much time would you give it for a CPU?
Also did you consider that I mentioned that it wasn't a problem with my first attempts but then it became? I got a result after a couple of seconds. This is what I expect with a CPU. Give it 5 or 10 seconds. I'm fine with that.
Again, I did not touch the generate()
method at all, when I refactored stuff and since I have this issue, I minimized the code to this in the entry post. The generate()
method now gets an even shorter string.
I do see some calculations happening, since I spread some console outputs in the sample()
method watching the inputs_ids
variable growing endlessly. But even after 30mins I still didn't hear an audio. Would the "small" Bark model really take that much time?
And I have absolutely no clue of why that is and how I can manipulate it so that I get the same experience with my first attempts.
I wonder if I can try out something with input_ids
that I can define on the generate()
method.
@BirgitPohl autoregressive generation is computationally expensive -- depending on the model and your CPU, taking a few minutes is not strange at all. Tagging @Vaibhavs10 here, who might be familiar with BARK/audio strategies for inference on CPU.
A note: it is expected that you see different run times (and results) across different runs. BARK relies on sampling, i.e. its runs are not deterministic and may result in a different number of tokens. Have a look at this guide for the basics of auto-regressive generation -- the principles for LLM or audio generation are the same.
@BirgitPohl since you are on mac you can use the mps device simply use the following code and you should in theory be good to go
class TextToSpeechService:
def __init__(self, device: str = "mps" if torch.backends.mps.is_available() else "cpu"):
(....)
I do not have a mac and I can't debug this here are some extra links on how to use you Mac chip :
let me know how this goes and happy coding ✨
System Info
python 3.12.2 transformer version: 4.37.2 but also 4.41.2 (I tried switching around the versions to see the difference) I'm using a CPU, Mac Book Pro, Chip M2, Memory 24 GB
I used BarkModel to generate a text to speech output and noticed it runs forever. I speaking of more than 30 minutes for a text such as 'sample text'. Debugging it I found that it loops through a while True loop in the sample() method.
When I first tried it out, I didn't have a problem. But after I refactored something in the main, but not touching anything from tts or the generate() method. Later then I left out what I did on the main, to see what was happening here and it would still run for over 30 minutes for one little audio output.
I'd like to have some guideline on how I can achieve reaching a break fast. Meaning, within a couple of seconds at least for a good Mac CPU.
Who can help?
Tagging @gante for genration and @stevhliu for documentation. I checked the documentation, and I could find guidelines to optimize, but none that helps me how I can reach a break in the while True look fast.
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
execution of
sample()
ofGenerationMixin
shouldn't take over 30 minutes for CPU devices until I need to decide to give up. It should rather take a couple of seconds if it is supposed to be slow on CPU.