j-csc / mlx_bark

Port of Suno's Bark TTS transformer in Apple's MLX Framework
61 stars 3 forks source link

The code halts in generating coarse tokens... #4

Open MisakiTaro0714 opened 5 months ago

MisakiTaro0714 commented 5 months ago

Dear Repo owner,

Thank you for sharing your amazing work.

For generation of audio in my local environment, I installed all the dependencies and forwarded so. However, for some reason, the coarse token generation halts in the middle of execuation (23/38). Your help in this issue is highly appreciated.

Screenshot 2024-02-05 at 1 50 28 PM

j-csc commented 5 months ago

Try asitop let me know what it says? Could it be memory issue

MisakiTaro0714 commented 5 months ago
Screenshot 2024-02-05 at 2 42 26 PM

This is the asitop output, I do not think the issue is related w memory :(

macklinhrw commented 5 months ago

I have the same issue, running on 128gb mem. After 23 tokens it freezes. With shorter prompts it seems to work.

samikama commented 5 months ago

Same here.

 62%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                                                                        | 23/37 [05:59<03:38, 15.63s/it]
Traceback (most recent call last):
  File "/Users/sami/C-Gen/Experiments/mlx_bark/mlx_bark/model.py", line 618, in <module>
    generate(args.path, args.text, args.model)
  File "/Users/sami/C-Gen/Experiments/mlx_bark/mlx_bark/model.py", line 597, in generate
    coarse_tokens = generate_coarse(
                    ^^^^^^^^^^^^^^^^
  File "/Users/sami/C-Gen/Experiments/mlx_bark/mlx_bark/model.py", line 490, in generate_coarse
    logits, kv_cache = model(
                       ^^^^^^
  File "/Users/sami/C-Gen/Experiments/mlx_bark/mlx_bark/model.py", line 289, in __call__
    x, kv = block(x, past_kv=past_layer_kv, use_cache=use_cache)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sami/C-Gen/Experiments/mlx_bark/mlx_bark/model.py", line 212, in __call__
    self.ln_1(x), past_kv=past_kv, use_cache=use_cache
    ^^^^^^^^^^^^
  File "/Users/sami/C-Gen/Experiments/mlx_bark/mlx_bark/model.py", line 106, in __call__
    x = (x - mean) * mx.rsqrt(var + self.eps)
                     ^^^^^^^^^^^^^^^^^^^^^^^^

This is the stack trace on interrupt.

j-csc commented 5 months ago

Hmm so basically a longer prompt jams the coarse model. I'll try and get to it once encodec is wrapped up

j-csc commented 4 months ago

Would one of you mind sharing the prompt or an equally long prompt that you entered into the model? I tried reproducing with a longer prompt that pushes close to the max (13 seconds) and it doesn't seem to cause the coarse token death...

Note that there is a token limit for audio generation. See how bark does it for longer prompts: https://github.com/suno-ai/bark/blob/main/notebooks/long_form_generation.ipynb

Also just double checking you downloaded the npz files from huggingface and placed it into the weights/ folder