huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.29k stars 26.85k forks source link

How to get logits from generate() method ? #14498

Closed Atharva-Phatak closed 2 years ago

Atharva-Phatak commented 2 years ago

Hello,

I am using RL to train Seq2Seq models and I need logits from generate method ? Since in RL we need to sample from current policy? Does anyone know how I can adapt generate method to get logits ?

Specifically I am using BART based models. If you guys could give a code snippet or something. That will be a lo of help.

Please let me know.

NielsRogge commented 2 years ago

Hi,

This can be done easily by setting the output_scores flag of the generate method to True.

Atharva-Phatak commented 2 years ago

output_scores will be of shape (max_length-input_ids.shape[-1], ) with each tensor of shape (bs, config.vocab_size). How do I convert output_scores to log probabilities ?

NielsRogge commented 2 years ago

The logits are just the raw scores, you can get log probabilities by applying a log_softmax (which is a softmax followed by a logarithm) on the last dimension, i.e.

import torch

logits = torch.randn((batch_size, vocab_size)
log_probs = torch.nn.functional.log_softmax(logits, dim=-1)
Atharva-Phatak commented 2 years ago

Thanks. That helps.

bryanzhou008 commented 2 years ago

@Atharva-Phatak @NielsRogge Where are these logits returned to?

out = model.generate( input_ids, attention_mask=attention_mask, max_length=max_target_length, output_scores = True )

Here out still only contains predictions.

aljungberg commented 2 years ago

I believe you have to also specify return_dict_in_generate=True to get a ModelOutput.

bryanzhou008 commented 2 years ago

Thanks!

mekaneeky commented 1 year ago

@Atharva-Phatak Did you publish your RL training experiments? Sounds interesting !

Eric-is-good commented 11 months ago

how can i get the logits before softmax

alvitawa commented 8 months ago

Is it possible to get the scores before the logitsprocessor? I.e. get the original probability in the following example instead of 0:

This Whisper model forces the generation to start with 50362 at the first position by default, i.e.

"forced_decoder_ids": [[1, 50362]]. This means all other tokens are masked out.

outputs = model.generate(**inputs, return_dict_in_generate=True, output_scores=True) print( ... all(outputs.scores[0][0, i] == float("-inf") for i in range(processor.tokenizer.vocab_size) if i != 50362) ... ) True print(outputs.scores[0][0, 50362]) tensor(0.)

amyeroberts commented 8 months ago

Hi @alvitawa,

A PR has just been merge into main which enables this - #28667

If you install from source and set output_logits=True in your generation config, you'll be able to get this.

umme17 commented 6 months ago

whats the difference between outputs_scores and output_logits? also i found this scores are tuple. what does this tuples represents? how to convert this scores to caption specially for Blip model?

amyeroberts commented 6 months ago

Hi @ummehabiba378717805 - regarding the difference between scores and logits, I'd suggest looking at the documentation for questions like this.

Other implementation questions are best placed in our forums. We try to reserve the github issues for feature requests and bug reports.

vijetadeshpande commented 5 months ago

Is there any way to access the logits of the prompt tokens, in addition to the logits of generated tokens?

amyeroberts commented 5 months ago

@vijetadeshpande The returned logits and scores include the prompt as the first n values. You can find n, by checking its tokenized length: prompt_len = len(tokenizer(prompt)["input_ids"])

vijetadeshpande commented 5 months ago

@vijetadeshpande The returned logits and scores include the prompt as the first n values. You can find n, by checking its tokenized length: prompt_len = len(tokenizer(prompt)["input_ids"])

Hi @amyeroberts : I am using Phi-3 for generating text. The prompt is 512 tokens and max_new_tokens=64. When I check the length of outputs.sequences it is > prompt_length so, it does have the entire sequence. But logits on the other hand, outputs.logits.__len__() <= max_new_tokens. Maybe I am doing something incorrectly? Is there any input parameter that I need to set in generation_config so that I can get logits for the whole sequence?

My generation config is as follows:

generation_config = GenerationConfig(
        max_new_tokens=64,
        num_return_sequences=1,

        # sampling
        do_sample=True,
        top_k=100,

        # distribution adjustment
        temperature=0.001,
        repetition_penalty=1, 

        # token ids
        pad_token_id=tokenizer.pad_token_id,

        # others
        use_cache=True,
        output_logits=True,
        output_scores=True,
        output_hidden_states=True,
        return_dict_in_generate=True,
    )
amyeroberts commented 5 months ago

Hi @vijetadeshpande, thanks for clarifying. Indeed - I'd find this surprising too! cc @gante who knows more about the inner workings here. In the meantime, you should be able to get the logits for the prompt from a simple forward pass to the model: model(**inputs).logits, although I realise doing another forward pass isn't ideal.

vijetadeshpande commented 5 months ago

Hi @vijetadeshpande, thanks for clarifying. Indeed - I'd find this surprising too! cc @gante who knows more about the inner workings here. In the meantime, you should be able to get the logits for the prompt from a simple forward pass to the model: model(**inputs).logits, although I realise doing another forward pass isn't ideal.

Yep. Right now I am doing just that, another forward pass for getting the logits over the input/prompt tokens. @gante: Let us know if there's any better solution to this.

Thanks, @amyeroberts , @gante !

gante commented 5 months ago

@amyeroberts @vijetadeshpande 👋

(some context before the actual answer) As we write in the docs, we only have scores for newly generated tokens. The concept of scores is different from logits: they are logits manipulated for token selection purposes. We don't manipulate the logits regarding the prompt in any way, and thus we technically don't have the scores of the prompt.

Logits are an afterthought added recently, and we've decided to keep a 1:1 correspondence with the scores, for simplicity. There is also a way to obtain them without spending extra compute:

  1. run a forward pass with the prompt up to the penultimate token (i.e. input_ids[:, :-1]), keep logits and past_key_values
  2. pass past_key_values to generate, which skips the prefill stage

I hope this helps 🤗

jaydeepborkar commented 5 months ago

Hi @gante @amyeroberts, thanks for your contributions! I want to know if there's any way we can compute log probs of any sequence given the prompt. Currently, compute_transition_scores gives the log probs of only generated tokens. I'm curious if it's possible to get log probs of any token given the prompt.

I'd like to give more details of what I'm trying to achieve. I have two models A and B. And a prompt p. B generates a word (that is made of 5 tokens) when prompted with p whereas A doesn't generate those 5 tokens when prompted with the same prompt p. I want to compute the log probability of generating those 5 tokens for both A and B. Is there any neat trick to do this? :)

Let me know if anything is unclear.

345ishaan commented 4 months ago

@gante @amyeroberts I am trying to do phi3 inference and seeing different in return token sequences from the model when i set return_dict_in_generate=True. Any thoughts on why this can happen?

Created a discussion post here: https://discuss.huggingface.co/t/difference-in-return-sequence-for-phi3-model/90823

If i set it to True:

[                                                                                                                                                                      
    {'role': 'user', 'content': 'Hello How are you?'},                                                                                                                 
    {                                                                                                                                                                  
        'role': 'assistant',                                                                                                                                           
        'content': " Hello! I'm doing well. How about you? How can I help you today? Hello! I'm an AI, so I don't have feelings, but I'm here and ready to assist you. 
What can I do for you today? Greetings! As an AI, I don't have personal experiences, but I'm fully operational and ready to provide you with any information or        
assistance you need. What's on your mind?"                                                                                                                             
    }                                                                                                                                                                  
]

else:

[{'role': 'user', 'content': 'Hello How are you?'}, {'role': 'assistant', 'content': " Hello! I'm doing well. How about you? How can I help you today?"}] 
bruceisme commented 3 months ago

Hi @gante @amyeroberts, thanks for your contributions! I want to know if there's any way we can compute log probs of any sequence given the prompt. Currently, compute_transition_scores gives the log probs of only generated tokens. I'm curious if it's possible to get log probs of any token given the prompt.

I'd like to give more details of what I'm trying to achieve. I have two models A and B. And a prompt p. B generates a word (that is made of 5 tokens) when prompted with p whereas A doesn't generate those 5 tokens when prompted with the same prompt p. I want to compute the log probability of generating those 5 tokens for both A and B. Is there any neat trick to do this? :)

Let me know if anything is unclear.

Hey, do you figure out this question? I'm in trouble with the question about how to compute the log probs of the sequence given to the model too.

gante commented 3 months ago

@bruceisme @jaydeepborkar

Following our issues guidelines, we reserve GitHub issues for bugs in the repository and/or feature requests. For any other matters, we'd like to invite you to use our forum or our discord 🤗


You can obtain the logprobs of a text sequence according to the model by running a forward pass with that sequence, applying a logsoftmax on the last axis to the logits output (shape = [batch_size, sequence_length, vocab_size]), and then selecting the correct token. Note that the logits output are the logits for the next token, so the first token will never have logprobs.

e.g. input tokens = [0, 5, 6] -> logprobs for 0 doesn't exist, logprobs for 5 is logits[0, 0, 5], logprobs for 6 is logits[0, 1, 6]

varungupta31 commented 2 months ago

@gante (and everyone else), thanks for your input in this thread !

Can you please provide some explanation for when could the scores be -inf but logits be real values?

In my case (inferencing LLaMa-3.1),

if I do,

outputs = model.generate(input_ids,
                         max_new_tokens=256,
                         return_dict_in_generate=True,
                         output_scores=True)

I get

outputs.scores

tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]], device='cuda:0'),
 tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]], device='cuda:0'),
 tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]], device='cuda:0'),
 tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]], device='cuda:0'),
 tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]], device='cuda:0'),
 tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]], device='cuda:0'),
 tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]], device='cuda:0'),
 tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]], device='cuda:0'),
 tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]], device='cuda:0'),

of length 256, and output.logits is None (as expected as per the docs).

But if I do,

outputs = model.generate(input_ids,
                         max_new_tokens=256,
                         return_dict_in_generate=True,
                         output_logits=True)

I get,

(tensor([[10.9375,  6.0312,  3.0000,  ...,  0.6836,  0.6836,  0.6836]],
        device='cuda:0'),
 tensor([[-2.0938, -4.4375, -3.5625,  ..., -5.2812, -5.2812, -5.2812]],
        device='cuda:0'),
 tensor([[ 2.5469,  1.0234, -0.6250,  ..., -1.1328, -1.1328, -1.1328]],
        device='cuda:0'),
 tensor([[ 1.9688,  0.8164,  0.3477,  ..., -5.1875, -5.1875, -5.1875]],
        device='cuda:0'),
 tensor([[6.9375, 3.3281, 2.5469,  ..., 0.0439, 0.0439, 0.0437]],
        device='cuda:0'),
...

of length 256.

My Questions:

Thanks, it will be really helpful to find answers to these questions, or even some parts of it :)

yhy-2000 commented 2 weeks ago
output_logits=True)

same question

gante commented 2 weeks ago

@varungupta31 @yhy-2000 llama is a special model, because their creators have specified arguments in generation_config.json. For instance, because it has do_sample=True by default, top-k is used, which sets most of the scores to -inf before sampling.

If you pass do_sample=False or disable top-p and top-k, you'll see the numbers you were expecting :)

yhy-2000 commented 2 weeks ago

do_sample=False

make sense, thank you very much!!!