huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.24k stars 26.34k forks source link

Passing inputs_embeds into GenerationMixin.generate() #6535

Closed ymfa closed 1 year ago

ymfa commented 4 years ago

🚀 Feature request

Currently GenerationMixin.generate() only accepts input_ids but not inputs_embeds. Therefore this method is not usable when custom input embeddings are required. In contrast, many models do accept inputs_embeds as input. Additionally, for models that have both an encoder and a decoder, it is not possible to run encoder.forward() and decoder.generate() separately, because generate() does not accept encoder_outputs as input.

Motivation

Having the flexibility to input inputs_embeds or encoder_outputs is essential for many tasks. For example, the input can be the concatenation of a sequence of word embeddings and an image embedding or style embedding (of the same embedding size). I want to use generate() with a T5 model fine-tuned for such as task, where the input sequence contains both word and non-word embeddings.

patrickvonplaten commented 4 years ago

Hey @ymfa,

thanks for the feature request :-) I'll put it on the To-Do list. Not sure how soon we will work on this though. If you have a good idea of how to design this new feature, feel free to open a PR :-)

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

umbertopietroni commented 3 years ago

Hi @patrickvonplaten, I'm interested in this feature as well as I'm using GPT-2 with custom input embedding. Is there currently a way to pass the inputs_embeds to the generate function instead of input_ids?

ymfa commented 3 years ago

I've started working on a small PR that provides the flexibility of passing encoder outputs into GenerationMixin.generate(). I chose encoder_outputs over inputs_embeds because they are more fundamental, thus the fix would be more generally useful. However, it might not satisfy @umbertopietroni's need as GPT-2 is not an encoder-decoder model.

gamebird96 commented 3 years ago

Is there any update on any kind of solution to it yet or any work around to pass encoder_outputs to generate ?

lamthuy commented 3 years ago

Any update on this issue?

kkkevinkkkkk commented 2 years ago

Any update on this issue?

patrickvonplaten commented 2 years ago

It is possible to run inputs_embeds for an encoder-decoder framework. See https://github.com/huggingface/transformers/pull/14443 . This does assume however that we know the word embedding matrix of the decoder.

However for models like GPT2 this is not as straight-forward - see: https://github.com/huggingface/transformers/pull/14443#discussion_r753167493

In general, what is the exact use-case people are interested in here?

sharifza commented 2 years ago

@patrickvonplaten for example in the recent NeurIPS paper "Multimodal Few-Shot Learning with Frozen Language Models", the output of a non-trained CNN is directly fed into a pre-trained and frozen language model. In this scenario, the CNN learns how to generate input embeddings such that the pre-trained language model can generate the right caption.

patrickvonplaten commented 2 years ago

I see - this makes sense! We should probably adapt the generate function then to allow this scenario. I'll put it on my TODO!

cuthalionn commented 2 years ago

I am trying to generate with a decoder-only model using inputs_embeds. Does anyone know useful resources on how to achieve this?

patrickvonplaten commented 2 years ago

This should already be possible - will try to put it in the big generate doc refactor that I'm working on at the moment - see https://github.com/huggingface/transformers/issues/15552

cuthalionn commented 2 years ago

Hi @patrickvonplaten, I am glad to hear there will be doc refactor for generation, thanks for working on this!

This should already be possible

I am using version 4.16.2, and when I try to generate with DialoGPT (a decoder only model) as follows

outputs = model.generate(inputs_embeds=inputs_embeds)

I get the following error: ValueError: If inputs_embeds is passed as model-specific keyword input then model has to be an encoder-decoder and not a GPT2LMHeadModel.

Tuan-Lee-23 commented 2 years ago

Hi @patrickvonplaten, I would like to know if there is any updates. I just really need the generate function with parameter inputs_embeds for GPT model

I see - this makes sense! We should probably adapt the generate function then to allow this scenario. I'll put it on my TODO!

Thank you

patrickvonplaten commented 2 years ago

@Tuan-Lee-23 - would you like to open a PR for this to give it a try? :-)

This would also help me understand the use case better

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

chaddech commented 2 years ago

Hi, I would also really love to see this. Just tried to generate from inputs_embeds on OPT and got the error message. Thanks!

patrickvonplaten commented 2 years ago

@chaddech , could you explain your use-case in a bit more detail here?

1) Why do you want to use word embeddings? 2) Are you not using at all the word embeddings of OPT? 3) Are your OPT model's input embeddings tied to the output embeddings?

In general I'm not completely against adding this feature, but only if the use case is solid since it requires lots of changes to generate()

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

inimah commented 2 years ago

Hi @patrickvonplaten

An example of use case (for me) is an open-ended text generation after soft-prompt tuning.

During the tuning, only the embeddings of n_tokens prompt is learnable. Other parameters are being frozen. So the input of forward() function is the concatenated embeddings of n_tokens prompt and the embeddings of actual input (discrete tokens). Prompt is represented as dummy -- no actual discrete token (word) linked to it.

See https://github.com/corolla-johnson/mkultra or https://github.com/kipgparker/soft-prompt-tuning for practicality.

It would be a lot easier if generation_utils allows for input_embs

Thank you.

@chaddech , could you explain your use-case in a bit more detail here?

  1. Why do you want to use word embeddings?
  2. Are you not using at all the word embeddings of OPT?
  3. Are your OPT model's input embeddings tied to the output embeddings?

In general I'm not completely against adding this feature, but only if the use case is solid since it requires lots of changes to generate()

patrickvonplaten commented 2 years ago

@ymfa could you maybe open a PR to show how a solution could look like (maybe just a super quick no dirty PR?)

Sorry I sadly won't have the time to dive into the other codebases or paper, but would be super happy to guide through a PR!

Also cc @patil-suraj @gante

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

mkyl commented 1 year ago

I'm running into the same ValueError above when trying to replicate the paper "Locating and Editing Factual Associations in GPT". This technique relies on injecting noise into the word embeddings to corrupt them. Having this feature added would be very useful. Thanks!

Edit: I found a workaround, allowing me to extract the next word from GPT2 given a custom embedding:

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")

custom_embeds = torch.randn(1, 5, 768)
outputs = model(inputs_embeds=custom_embeds)
probs = outputs.logits[:, -1, :].flatten()
next_token = probs.argmax()
tokenizer.decode(next_token)
gante commented 1 year ago

Hi @mkyl (and other participants in this thread) 👋

As written above, passing inputs_embeds with decoder-only models is not possible at the moment. I see from the number of comments and likes above that this would be a somewhat appreciated functionality, so I want to help the folks here.

Here's the issue -- generate() does a LOT of lifting to keep its interface simple. To enable calls with inputs_embeds we would need to greatly increase the complexity of an already complex piece of code, hurting everyone in the long run 🙅 Thankfully, there is an alternative: we can manually prepare a few inputs and call the generation methods directly, which support passing inputs_embeds. The catch is that a critical component of the models, prepare_inputs_for_generation, is not expecting inputs_embeds, so we will have to monkey patch it. But it works, as you can see in the example below 🙌 (I hope this example helps!)

The monkey patch is inconvenient, but I'm not entirely convinced that adding this feature is worth modifying tens of models. I make the following pact with y'all: ⚠️ if this obscure comment in a closed issue reaches 10 reactions, I will implement the change to prepare_inputs_for_generation on all text-generation models. (Whoever does the 10th reaction, please tag me; cc @patrickvonplaten )


(Note: prior to v4.26, you have to replace past_key_values by past in the code below)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, MaxLengthCriteria, StoppingCriteriaList

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

text = "Hello world"
input_ids = tokenizer.encode(text, return_tensors="pt")

# Traditional way of generating text
outputs = model.generate(input_ids)
print("\ngenerate + input_ids:", tokenizer.decode(outputs[0], skip_special_tokens=True))

# Generating with decoder models from inputs_embeds
# Step 1: monkey patch "prepare_inputs_for_generation" to pass inputs_embeds when they are available
def prepare_inputs_for_generation(input_ids, past_key_values=None, **kwargs):
    token_type_ids = kwargs.get("token_type_ids", None)
    # only last token for inputs_ids if past_key_values is defined in kwargs
    if past_key_values:
        input_ids = input_ids[:, -1].unsqueeze(-1)
        if token_type_ids is not None:
            token_type_ids = token_type_ids[:, -1].unsqueeze(-1)

    attention_mask = kwargs.get("attention_mask", None)
    position_ids = kwargs.get("position_ids", None)

    if attention_mask is not None and position_ids is None:
        # create position_ids on the fly for batch generation
        position_ids = attention_mask.long().cumsum(-1) - 1
        position_ids.masked_fill_(attention_mask == 0, 1)
        if past_key_values:
            position_ids = position_ids[:, -1].unsqueeze(-1)
    else:
        position_ids = None

    # !!!!!!!!!!!!!!!!!!! start: modified vs original, to pass inputs_embeds when they are available
    if "inputs_embeds" in kwargs and past_key_values is None:  # we only want to use them in the 1st generation step
        model_inputs = {"inputs_embeds": inputs_embeds}
    else:
        model_inputs = {"input_ids": input_ids}
    model_inputs.update({
        "past_key_values": past_key_values,
        "use_cache": kwargs.get("use_cache"),
        "position_ids": position_ids,
        "attention_mask": attention_mask,
        "token_type_ids": token_type_ids,
    })
    return model_inputs
    # !!!!!!!!!!!!!!!!!!! end: modified vs original, to pass inputs_embeds when they are available
model.prepare_inputs_for_generation = prepare_inputs_for_generation

# Step 2: prepare the inputs for the generation method manually and call it
inputs_embeds = model.transformer.wte(input_ids)
# empty input ids -> the output will NOT include the input prompt, but will generate the same text (because of
# inputs_embeds)
input_ids = torch.LongTensor([[model.config.bos_token_id]])
stopping_criteria = StoppingCriteriaList([MaxLengthCriteria(max_length=20)])
outputs = model.greedy_search(
    input_ids, inputs_embeds=inputs_embeds, stopping_criteria=stopping_criteria, pad_token_id=model.config.eos_token_id
)
print("\ngreedy + inputs_embeds:", tokenizer.decode(outputs[0], skip_special_tokens=True))
Ryul0rd commented 1 year ago

@gante We hit ~10~ 11!

BugApe commented 1 year ago

@gante In this way, it seems that only the first token is based on inputs_embeds.

gante commented 1 year ago

Oh damn, this exceeded my expectations 🙈 Added to my todo list! Keep in mind that my queue is long at the moment, so this might take a few months.

gante commented 1 year ago

@BugApe in the example above, only the first forward pass will have inputs_embeds as input, but you can have more than one token there. If your target application requires manipulating inputs_embeds at each generation step, then you'd need to monkey-patch prepare_inputs_for_generation to embed the newly generated tokens and then manipulate it as you wish. That will not be included in the planned changes.

However, in theory, I could make generate accept a dictionary of arbitrary functions to be applied to each input in prepare_inputs_for_generation (e.g. a function that embeds input_ids and then add some noise, to be applied at each step before the forward pass).

I'll make the same pact as above: if this comment reaches 10 reactions, I will add the functionality to my todo list. (whoever does the 10th reaction, please tag me)

XuanVuNguyen commented 1 year ago

Hi, how's the state of this feature? It would add a lot of flexibility to model construction

XuanVuNguyen commented 1 year ago

@gante The monkey patch works nice for model.greedy_search and model.contrastive_search, but cannot work with model.generate, which has more utilities. Could you provide a monkey patch that works with model.generate? Many thanks!

gante commented 1 year ago

Hi @mkyl, @Ryul0rd, @XuanVuNguyen, and other participants in this thread 👋

As promised, inputs_embeds can be passed to .generate() with decoder-only models 💪 See the example below for reference. The caveat is that if you want it on other models (in addition to GPT2), you'll have to open a PR that makes the same changes as the ones to GPT2 in this PR to your model of choice :)

To access it, install the latest version: pip install --upgrade git+https://github.com/huggingface/transformers.git

EDIT: as of 2023-Mar-16, you can access this feature by installing v4.27. There are a few models with soft-prompting enabled -- try running it and, if the model lacks support, you'll get an exception with instructions.


from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

text = "Hello world"
input_ids = tokenizer.encode(text, return_tensors="pt")

# Traditional way of generating text
outputs = model.generate(input_ids)
print("\ngenerate + input_ids:", tokenizer.decode(outputs[0], skip_special_tokens=True))

# From inputs_embeds -- exact same output if you also pass `input_ids`. If you don't
# pass `input_ids`, you will get the same generated content but without the prompt
inputs_embeds = model.transformer.wte(input_ids)
outputs = model.generate(input_ids, inputs_embeds=inputs_embeds)
print("\ngenerate + inputs_embeds:", tokenizer.decode(outputs[0], skip_special_tokens=True))
gante commented 1 year ago

@BugApe I'm closing this issue for now, as your request is kind of an advanced functionality. However, if it hits the 10 reacts, let me know -- I'll reopen this issue then! 🤗

ws1993109 commented 1 year ago

@gante Hi, could you please add this feature to chatglm2 too, thanks a lot.

gante commented 1 year ago

Hey @ws1993109 👋

I have low bandwidth at the moment. Would you like to open a PR with it and test it? It should be mostly like copying the changes on GPT2 from this PR, and confirming that it works.

elitalobo commented 10 months ago

Hi, Is there a way to compute the gradient of the logits produced by the generate function with respect to the input embeddings (inputs_embeds)?

gante commented 10 months ago

@elitalobo it should work the same as if you're passing input_ids and labels, but in this case you would be passing input_embeds and labels :)

elitalobo commented 10 months ago

@gante thank you for your fast reply. The generate function has @no_grad decorator so I am unable to obtain the gradients of the logits corresponding to each output token via this function. My solution is to use the forward call multiple times to obtain the gradient of each output token with respect to input embeddings. Is there a more efficient way to obtain these gradients ?

gante commented 9 months ago

@elitalobo generate is non-differentiable, so you won't be able to train using it. I suggest you review our material for training models on our docs, and replace input_ids by input_embeds in your case.

Following our issues guidelines, we reserve GitHub issues for bugs in the repository and/or feature requests. For any other matters, we'd like to invite you to use our forum or our discord 🤗