I enabled the verbose mode in scoring/vibe_score.py and got this:
Last user message: *Smiles slyly, lowering her voice* Ah, the infamous "BB Meatball"... Now there's a houseguest who knows how to make an entrance! *Winks conspiratorially* The game was never the same after she took the stage, and not just because of her cooking skills.<|eot_id|>
Generated text: *Smiles slyly, lowering her voice* Ah, the infamous "BB Meatball"... Now there's a houseguest who knows how to make an entrance! *Winks conspiratorially* The game was never the same after she took the stage, and not just because of her cooking skills.
Vibe score: 0.9265049695968628
As you can see the messages are identical, but the score is ~0.92. This is because the scoring.dataset.StreamedSyntheticDataset.__getitem__ method returns the character_response with the EOS_TOKEN, while the decoded_messages variable from the scoring/vibe_score.py:53 script doesn't have it. I suggest to add the EOS_TOKEN before calculating the length of the decoded message.
We can accept input_tokenizer in the calculate_vibe_match_score function and then change this:
I enabled the verbose mode in
scoring/vibe_score.py
and got this:As you can see the messages are identical, but the score is ~0.92. This is because the
scoring.dataset.StreamedSyntheticDataset.__getitem__
method returns thecharacter_response
with the EOS_TOKEN, while thedecoded_messages
variable from thescoring/vibe_score.py:53
script doesn't have it. I suggest to add theEOS_TOKEN
before calculating the length of the decoded message.We can accept
input_tokenizer
in thecalculate_vibe_match_score
function and then change this:to this: