Open quochung-04 opened 1 month ago
Hi, @quochung-04
You are really detail-oriented. :) I think its because BarkCoarseModel takes as the input the results of BarkSemanticModel. And the BarkSemanticModel's LM head is 10048. But these all are fed into simultaneously. So it has offset 10048.
Just for clearing I am seeing that in the semantic model, the encoded_text add with TEXT_ENCODING_OFFSET which have been defined as TEXT_ENCODING_OFFSET = 10_048.
Anyone understand why adding this offset. Will this cause the encoded_text to deviate from the original token? I can see there is a lm_head with the output of 10048. However this still make me confused.
Thanks in advance.