serp-ai / bark-with-voice-clone

🔊 Text-prompted Generative Audio Model - With the ability to clone voices
https://serp.ai/tools/bark-text-to-speech-ai-voice-clone-app
Other
3.06k stars 411 forks source link

The purpose of TEXT_ENCODING_OFFSET #104

Open quochung-04 opened 1 month ago

quochung-04 commented 1 month ago

Just for clearing I am seeing that in the semantic model, the encoded_text add with TEXT_ENCODING_OFFSET which have been defined as TEXT_ENCODING_OFFSET = 10_048.

Anyone understand why adding this offset. Will this cause the encoded_text to deviate from the original token? I can see there is a lm_head with the output of 10048. However this still make me confused.

Thanks in advance.

dagshub[bot] commented 1 month ago

Join the discussion on DagsHub!

phantomwork commented 4 weeks ago

Hi, @quochung-04

You are really detail-oriented. :) I think its because BarkCoarseModel takes as the input the results of BarkSemanticModel. And the BarkSemanticModel's LM head is 10048. But these all are fed into simultaneously. So it has offset 10048.