impel-intelligence / dippy-bittensor-subnet

MIT License
48 stars 20 forks source link

Add the EOS_TOKEN to generated text in scoring/vibe_score.py #116

Open itorgov opened 1 week ago

itorgov commented 1 week ago

I enabled the verbose mode in scoring/vibe_score.py and got this:

Last user message:  *Smiles slyly, lowering her voice* Ah, the infamous "BB Meatball"... Now there's a houseguest who knows how to make an entrance! *Winks conspiratorially* The game was never the same after she took the stage, and not just because of her cooking skills.<|eot_id|>
Generated text:  *Smiles slyly, lowering her voice* Ah, the infamous "BB Meatball"... Now there's a houseguest who knows how to make an entrance! *Winks conspiratorially* The game was never the same after she took the stage, and not just because of her cooking skills.
Vibe score: 0.9265049695968628

As you can see the messages are identical, but the score is ~0.92. This is because the scoring.dataset.StreamedSyntheticDataset.__getitem__ method returns the character_response with the EOS_TOKEN, while the decoded_messages variable from the scoring/vibe_score.py:53 script doesn't have it. I suggest to add the EOS_TOKEN before calculating the length of the decoded message.

We can accept input_tokenizer in the calculate_vibe_match_score function and then change this:

decoded_len = len(decoded)

to this:

decoded_len = len(f"{decoded}{input_tokenizer.eos_token}")
dataai1205 commented 2 days ago

Please check this : https://github.com/impel-intelligence/dippy-bittensor-subnet/pull/123