Closed ivsanro1 closed 6 months ago
I don't mind the function returning probabilities, but it really needs to be optional. As it is this would break all other software currently using the streaming generator.
Thanks @turboderp for the feedback, that's a good point. Now it's optional and disabled by default.
Please tell me if you'd like me to implement it in some other way that adapts better to exllamav2 logic style
Sorry for taking a while to get to this. I had to change the logic a little bit and return a tensor of probabilities instead, because the streaming function doesn't always return exactly one token:
So the number of probs returned should always equal the number of tokens returned. You would use it something like this:
generator.return_probabilities = True
...
id_to_piece = tokenizer.get_id_to_piece_list() # Get the cleaned tokenizer vocab
while True:
chunk, eos, tokens, probs = generator.stream()
generated_tokens += 1 # One token is always generated, even if it's held in the generator
for i in range(tokens.shape[-1]):
token = tokens[:, i].item() # Token
prob = probs[:, i].item() # Probability (at the final sampling stage)
piece = id_to_piece[token] # Token piece
print(f"{prob:8.5f} - {repr(piece)}")
if eos or generated_tokens == max_new_tokens: break
And the output would be e.g.:
0.33173 - ' lived'
1.00000 - ' a'
0.55938 - ' young'
0.64523 - ' girl'
1.00000 - ' named'
0.06762 - ' E'
0.79568 - 'lean'
1.00000 - 'or'
0.96364 - '.'
0.71853 - ' E'
1.00000 - 'lean'
1.00000 - 'or'
0.80822 - ' was'
0.28309 - ' an'
0.47551 - ' intelligent'
0.97350 - ' and'
1.00000 - ' curious'
0.96953 - ' child'
0.82568 - ' who'
0.98809 - ' loved'
0.88151 - ' nothing'
1.00000 - ' more'
1.00000 - ' than'
This PR adds the probability of the tokens to the return tuple of
generator.stream()
TL;DR: I thought it could be positive to add it in general to exllamav2, as it can have several uses, and we already have the probabilities.
Background
I actually need this to:
Another possible use-case is to print the tokens in a color/background that depends token probability. This is especially for analysis and debugging (e.g. like in OpenAI playground).
Something like this: