serp-ai / bark-with-voice-clone

🔊 Text-prompted Generative Audio Model - With the ability to clone voices
https://serp.ai/tools/bark-text-to-speech-ai-voice-clone-app
Other
3.14k stars 424 forks source link

AssertionError no matter if it's 4s or 40s or 40 minutes #18

Open polidox2 opened 1 year ago

polidox2 commented 1 year ago

Hello, tried with 3 different audio lenghts, even 4s it give this error.

AssertionError                            Traceback (most recent call last)
Cell In[29], line 10
      1 # generation with more control
      2 x_semantic = generate_text_semantic(
      3     text_prompt,
      4     history_prompt=voice_name,
   (...)
      7     top_p=0.95,
      8 )
---> 10 x_coarse_gen = generate_coarse(
     11     x_semantic,
     12     history_prompt=voice_name,
     13     temp=0.7,
     14     top_k=50,
     15     top_p=0.95,
     16 )
     17 x_fine_gen = generate_fine(
     18     x_coarse_gen,
     19     history_prompt=voice_name,
     20     temp=0.5,
     21 )
     22 audio_array = codec_decode(x_semantic)

File ~\bark-with-voice-clone\bark\generation.py:521, in generate_coarse(x_semantic, history_prompt, temp, top_k, top_p, use_gpu, silent, max_coarse_history, sliding_window_len, model, use_kv_caching)
    519 x_semantic_history = x_history["semantic_prompt"]
    520 x_coarse_history = x_history["coarse_prompt"]
--> 521 assert (
    522     isinstance(x_semantic_history, np.ndarray)
    523     and len(x_semantic_history.shape) == 1
    524     and len(x_semantic_history) > 0
    525     and x_semantic_history.min() >= 0
    526     and x_semantic_history.max() <= SEMANTIC_VOCAB_SIZE - 1
    527     and isinstance(x_coarse_history, np.ndarray)
    528     and len(x_coarse_history.shape) == 2
    529     and x_coarse_history.shape[0] == N_COARSE_CODEBOOKS
    530     and x_coarse_history.shape[-1] >= 0
    531     and x_coarse_history.min() >= 0
    532     and x_coarse_history.max() <= CODEBOOK_SIZE - 1
    533     and (
    534         round(x_coarse_history.shape[-1] / len(x_semantic_history), 1)
    535         == round(semantic_to_coarse_ratio / N_COARSE_CODEBOOKS, 1)
    536     )
    537 )
    538 x_coarse_history = _flatten_codebooks(x_coarse_history) + SEMANTIC_VOCAB_SIZE
    539 # trim histories correctly

AssertionError: 

Please if you can help, thank you so uch

zhoudaifa007 commented 1 year ago

I have the same problem image

NickAnastasoff commented 1 year ago

I think I figured it out. If you go to line 567 you should see this: and ( round(x_coarse_history.shape[-1] / len(x_semantic_history), 1) == round(semantic_to_coarse_ratio / N_COARSE_CODEBOOKS, 1) ) and I just changed the 1's to 0's and ( round(x_coarse_history.shape[-1] / len(x_semantic_history), 0) == round(semantic_to_coarse_ratio / N_COARSE_CODEBOOKS, 0) ) This error is happening because these values aren't being rounded enough, so I just set it to a full number.

Ishani71199 commented 1 year ago

changed the value from 1 to 0 still getting same error.

565 and x_coarse_history.max() <= CODEBOOK_SIZE - 1 566 and ( --> 567 round(x_coarse_history.shape[-1] / len(x_semantic_history), 0) 568 == round(semantic_to_coarse_ratio / N_COARSE_CODEBOOKS, 0) 569 )

AssertionError: