Closed Atharva-Phatak closed 2 years ago
Hi,
This can be done easily by setting the output_scores
flag of the generate method to True
.
output_scores will be of shape (max_length-input_ids.shape[-1], )
with each tensor of shape (bs, config.vocab_size)
.
How do I convert output_scores to log probabilities ?
The logits are just the raw scores, you can get log probabilities by applying a log_softmax
(which is a softmax followed by a logarithm) on the last dimension, i.e.
import torch
logits = torch.randn((batch_size, vocab_size)
log_probs = torch.nn.functional.log_softmax(logits, dim=-1)
Thanks. That helps.
@Atharva-Phatak @NielsRogge Where are these logits returned to?
out = model.generate( input_ids, attention_mask=attention_mask, max_length=max_target_length, output_scores = True )
Here out still only contains predictions.
I believe you have to also specify return_dict_in_generate=True
to get a ModelOutput
.
Thanks!
@Atharva-Phatak Did you publish your RL training experiments? Sounds interesting !
how can i get the logits before softmax
Is it possible to get the scores before the logitsprocessor? I.e. get the original probability in the following example instead of 0:
This Whisper model forces the generation to start with
50362
at the first position by default, i.e.
"forced_decoder_ids": [[1, 50362]]
. This means all other tokens are masked out.outputs = model.generate(**inputs, return_dict_in_generate=True, output_scores=True) print( ... all(outputs.scores[0][0, i] == float("-inf") for i in range(processor.tokenizer.vocab_size) if i != 50362) ... ) True print(outputs.scores[0][0, 50362]) tensor(0.)
Hi @alvitawa,
A PR has just been merge into main which enables this - #28667
If you install from source and set output_logits=True
in your generation config, you'll be able to get this.
whats the difference between outputs_scores and output_logits? also i found this scores are tuple. what does this tuples represents? how to convert this scores to caption specially for Blip model?
Hi @ummehabiba378717805 - regarding the difference between scores and logits, I'd suggest looking at the documentation for questions like this.
Other implementation questions are best placed in our forums. We try to reserve the github issues for feature requests and bug reports.
Is there any way to access the logits of the prompt tokens, in addition to the logits of generated tokens?
@vijetadeshpande The returned logits and scores include the prompt as the first n values. You can find n, by checking its tokenized length: prompt_len = len(tokenizer(prompt)["input_ids"])
@vijetadeshpande The returned logits and scores include the prompt as the first n values. You can find n, by checking its tokenized length:
prompt_len = len(tokenizer(prompt)["input_ids"])
Hi @amyeroberts : I am using Phi-3 for generating text. The prompt is 512 tokens and max_new_tokens=64
. When I check the length of outputs.sequences
it is > prompt_length
so, it does have the entire sequence. But logits on the other hand, outputs.logits.__len__() <= max_new_tokens
. Maybe I am doing something incorrectly? Is there any input parameter that I need to set in generation_config
so that I can get logits for the whole sequence?
My generation config is as follows:
generation_config = GenerationConfig(
max_new_tokens=64,
num_return_sequences=1,
# sampling
do_sample=True,
top_k=100,
# distribution adjustment
temperature=0.001,
repetition_penalty=1,
# token ids
pad_token_id=tokenizer.pad_token_id,
# others
use_cache=True,
output_logits=True,
output_scores=True,
output_hidden_states=True,
return_dict_in_generate=True,
)
Hi @vijetadeshpande, thanks for clarifying. Indeed - I'd find this surprising too! cc @gante who knows more about the inner workings here. In the meantime, you should be able to get the logits for the prompt from a simple forward pass to the model: model(**inputs).logits
, although I realise doing another forward pass isn't ideal.
Hi @vijetadeshpande, thanks for clarifying. Indeed - I'd find this surprising too! cc @gante who knows more about the inner workings here. In the meantime, you should be able to get the logits for the prompt from a simple forward pass to the model:
model(**inputs).logits
, although I realise doing another forward pass isn't ideal.
Yep. Right now I am doing just that, another forward pass for getting the logits over the input/prompt tokens. @gante: Let us know if there's any better solution to this.
Thanks, @amyeroberts , @gante !
@amyeroberts @vijetadeshpande 👋
(some context before the actual answer) As we write in the docs, we only have scores for newly generated tokens. The concept of scores is different from logits: they are logits manipulated for token selection purposes. We don't manipulate the logits regarding the prompt in any way, and thus we technically don't have the scores of the prompt.
Logits are an afterthought added recently, and we've decided to keep a 1:1 correspondence with the scores, for simplicity. There is also a way to obtain them without spending extra compute:
input_ids[:, :-1]
), keep logits
and past_key_values
past_key_values
to generate
, which skips the prefill stageI hope this helps 🤗
Hi @gante @amyeroberts, thanks for your contributions! I want to know if there's any way we can compute log probs of any sequence given the prompt. Currently, compute_transition_scores
gives the log probs of only generated tokens. I'm curious if it's possible to get log probs of any token given the prompt.
I'd like to give more details of what I'm trying to achieve. I have two models A and B. And a prompt p. B generates a word (that is made of 5 tokens) when prompted with p whereas A doesn't generate those 5 tokens when prompted with the same prompt p. I want to compute the log probability of generating those 5 tokens for both A and B. Is there any neat trick to do this? :)
Let me know if anything is unclear.
@gante @amyeroberts I am trying to do phi3 inference and seeing different in return token sequences from the model when i set return_dict_in_generate=True
. Any thoughts on why this can happen?
Created a discussion post here: https://discuss.huggingface.co/t/difference-in-return-sequence-for-phi3-model/90823
If i set it to True:
[
{'role': 'user', 'content': 'Hello How are you?'},
{
'role': 'assistant',
'content': " Hello! I'm doing well. How about you? How can I help you today? Hello! I'm an AI, so I don't have feelings, but I'm here and ready to assist you.
What can I do for you today? Greetings! As an AI, I don't have personal experiences, but I'm fully operational and ready to provide you with any information or
assistance you need. What's on your mind?"
}
]
else:
[{'role': 'user', 'content': 'Hello How are you?'}, {'role': 'assistant', 'content': " Hello! I'm doing well. How about you? How can I help you today?"}]
Hi @gante @amyeroberts, thanks for your contributions! I want to know if there's any way we can compute log probs of any sequence given the prompt. Currently,
compute_transition_scores
gives the log probs of only generated tokens. I'm curious if it's possible to get log probs of any token given the prompt.I'd like to give more details of what I'm trying to achieve. I have two models A and B. And a prompt p. B generates a word (that is made of 5 tokens) when prompted with p whereas A doesn't generate those 5 tokens when prompted with the same prompt p. I want to compute the log probability of generating those 5 tokens for both A and B. Is there any neat trick to do this? :)
Let me know if anything is unclear.
Hey, do you figure out this question? I'm in trouble with the question about how to compute the log probs of the sequence given to the model too.
@bruceisme @jaydeepborkar
Following our issues guidelines, we reserve GitHub issues for bugs in the repository and/or feature requests. For any other matters, we'd like to invite you to use our forum or our discord 🤗
You can obtain the logprobs of a text sequence according to the model by running a forward pass with that sequence, applying a logsoftmax on the last axis to the logits
output (shape = [batch_size, sequence_length, vocab_size]
), and then selecting the correct token. Note that the logits
output are the logits for the next token, so the first token will never have logprobs.
e.g. input tokens = [0, 5, 6] -> logprobs for 0 doesn't exist, logprobs for 5 is logits[0, 0, 5]
, logprobs for 6 is logits[0, 1, 6]
@gante (and everyone else), thanks for your input in this thread !
Can you please provide some explanation for when could the scores
be -inf
but logits
be real values?
In my case (inferencing LLaMa-3.1
),
if I do,
outputs = model.generate(input_ids,
max_new_tokens=256,
return_dict_in_generate=True,
output_scores=True)
I get
outputs.scores
tensor([[-inf, -inf, -inf, ..., -inf, -inf, -inf]], device='cuda:0'),
tensor([[-inf, -inf, -inf, ..., -inf, -inf, -inf]], device='cuda:0'),
tensor([[-inf, -inf, -inf, ..., -inf, -inf, -inf]], device='cuda:0'),
tensor([[-inf, -inf, -inf, ..., -inf, -inf, -inf]], device='cuda:0'),
tensor([[-inf, -inf, -inf, ..., -inf, -inf, -inf]], device='cuda:0'),
tensor([[-inf, -inf, -inf, ..., -inf, -inf, -inf]], device='cuda:0'),
tensor([[-inf, -inf, -inf, ..., -inf, -inf, -inf]], device='cuda:0'),
tensor([[-inf, -inf, -inf, ..., -inf, -inf, -inf]], device='cuda:0'),
tensor([[-inf, -inf, -inf, ..., -inf, -inf, -inf]], device='cuda:0'),
of length 256
, and output.logits
is None
(as expected as per the docs).
But if I do,
outputs = model.generate(input_ids,
max_new_tokens=256,
return_dict_in_generate=True,
output_logits=True)
I get,
(tensor([[10.9375, 6.0312, 3.0000, ..., 0.6836, 0.6836, 0.6836]],
device='cuda:0'),
tensor([[-2.0938, -4.4375, -3.5625, ..., -5.2812, -5.2812, -5.2812]],
device='cuda:0'),
tensor([[ 2.5469, 1.0234, -0.6250, ..., -1.1328, -1.1328, -1.1328]],
device='cuda:0'),
tensor([[ 1.9688, 0.8164, 0.3477, ..., -5.1875, -5.1875, -5.1875]],
device='cuda:0'),
tensor([[6.9375, 3.3281, 2.5469, ..., 0.0439, 0.0439, 0.0437]],
device='cuda:0'),
...
of length 256
.
My Questions:
-inf
while the logits seem valid? [When the input/rest of the other params remain constant]log_probs
. Is compute_transition_scores
the correct method for that?log_probs
via the logits
above? Thanks, it will be really helpful to find answers to these questions, or even some parts of it :)
output_logits=True)
same question
@varungupta31 @yhy-2000 llama is a special model, because their creators have specified arguments in generation_config.json
. For instance, because it has do_sample=True
by default, top-k is used, which sets most of the scores to -inf
before sampling.
If you pass do_sample=False
or disable top-p and top-k, you'll see the numbers you were expecting :)
do_sample=False
make sense, thank you very much!!!
Hello,
I am using RL to train Seq2Seq models and I need logits from generate method ? Since in RL we need to sample from current policy? Does anyone know how I can adapt generate method to get logits ?
Specifically I am using BART based models. If you guys could give a code snippet or something. That will be a lo of help.
Please let me know.