Closed ymfa closed 1 year ago
Hey @ymfa,
thanks for the feature request :-) I'll put it on the To-Do list. Not sure how soon we will work on this though. If you have a good idea of how to design this new feature, feel free to open a PR :-)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi @patrickvonplaten, I'm interested in this feature as well as I'm using GPT-2 with custom input embedding. Is there currently a way to pass the inputs_embeds to the generate function instead of input_ids?
I've started working on a small PR that provides the flexibility of passing encoder outputs into GenerationMixin.generate(). I chose encoder_outputs
over inputs_embeds
because they are more fundamental, thus the fix would be more generally useful. However, it might not satisfy @umbertopietroni's need as GPT-2 is not an encoder-decoder model.
Is there any update on any kind of solution to it yet or any work around to pass encoder_outputs to generate ?
Any update on this issue?
Any update on this issue?
It is possible to run inputs_embeds
for an encoder-decoder framework. See https://github.com/huggingface/transformers/pull/14443 . This does assume however that we know the word embedding matrix of the decoder.
However for models like GPT2 this is not as straight-forward - see: https://github.com/huggingface/transformers/pull/14443#discussion_r753167493
In general, what is the exact use-case people are interested in here?
@patrickvonplaten for example in the recent NeurIPS paper "Multimodal Few-Shot Learning with Frozen Language Models", the output of a non-trained CNN is directly fed into a pre-trained and frozen language model. In this scenario, the CNN learns how to generate input embeddings such that the pre-trained language model can generate the right caption.
I see - this makes sense! We should probably adapt the generate function then to allow this scenario. I'll put it on my TODO!
I am trying to generate with a decoder-only model using inputs_embeds. Does anyone know useful resources on how to achieve this?
This should already be possible - will try to put it in the big generate doc refactor that I'm working on at the moment - see https://github.com/huggingface/transformers/issues/15552
Hi @patrickvonplaten, I am glad to hear there will be doc refactor for generation, thanks for working on this!
This should already be possible
I am using version 4.16.2, and when I try to generate with DialoGPT (a decoder only model) as follows
outputs = model.generate(inputs_embeds=inputs_embeds)
I get the following error:
ValueError: If inputs_embeds is passed as model-specific keyword input then model has to be an encoder-decoder and not a GPT2LMHeadModel.
Hi @patrickvonplaten,
I would like to know if there is any updates. I just really need the generate function with parameter inputs_embeds
for GPT model
I see - this makes sense! We should probably adapt the generate function then to allow this scenario. I'll put it on my TODO!
Thank you
@Tuan-Lee-23 - would you like to open a PR for this to give it a try? :-)
This would also help me understand the use case better
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi, I would also really love to see this. Just tried to generate from inputs_embeds on OPT and got the error message. Thanks!
@chaddech , could you explain your use-case in a bit more detail here?
1) Why do you want to use word embeddings? 2) Are you not using at all the word embeddings of OPT? 3) Are your OPT model's input embeddings tied to the output embeddings?
In general I'm not completely against adding this feature, but only if the use case is solid since it requires lots of changes to generate()
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi @patrickvonplaten
An example of use case (for me) is an open-ended text generation after soft-prompt tuning.
During the tuning, only the embeddings of n_tokens prompt is learnable. Other parameters are being frozen. So the input of forward() function is the concatenated embeddings of n_tokens prompt and the embeddings of actual input (discrete tokens). Prompt is represented as dummy -- no actual discrete token (word) linked to it.
See https://github.com/corolla-johnson/mkultra or https://github.com/kipgparker/soft-prompt-tuning for practicality.
It would be a lot easier if generation_utils allows for input_embs
Thank you.
@chaddech , could you explain your use-case in a bit more detail here?
- Why do you want to use word embeddings?
- Are you not using at all the word embeddings of OPT?
- Are your OPT model's input embeddings tied to the output embeddings?
In general I'm not completely against adding this feature, but only if the use case is solid since it requires lots of changes to
generate()
@ymfa could you maybe open a PR to show how a solution could look like (maybe just a super quick no dirty PR?)
Sorry I sadly won't have the time to dive into the other codebases or paper, but would be super happy to guide through a PR!
Also cc @patil-suraj @gante
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I'm running into the same ValueError
above when trying to replicate the paper "Locating and Editing Factual Associations in GPT". This technique relies on injecting noise into the word embeddings to corrupt them. Having this feature added would be very useful. Thanks!
Edit: I found a workaround, allowing me to extract the next word from GPT2 given a custom embedding:
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")
custom_embeds = torch.randn(1, 5, 768)
outputs = model(inputs_embeds=custom_embeds)
probs = outputs.logits[:, -1, :].flatten()
next_token = probs.argmax()
tokenizer.decode(next_token)
Hi @mkyl (and other participants in this thread) 👋
As written above, passing inputs_embeds
with decoder-only models is not possible at the moment. I see from the number of comments and likes above that this would be a somewhat appreciated functionality, so I want to help the folks here.
Here's the issue -- generate()
does a LOT of lifting to keep its interface simple. To enable calls with inputs_embeds
we would need to greatly increase the complexity of an already complex piece of code, hurting everyone in the long run 🙅 Thankfully, there is an alternative: we can manually prepare a few inputs and call the generation methods directly, which support passing inputs_embeds
. The catch is that a critical component of the models, prepare_inputs_for_generation
, is not expecting inputs_embeds
, so we will have to monkey patch it. But it works, as you can see in the example below 🙌 (I hope this example helps!)
The monkey patch is inconvenient, but I'm not entirely convinced that adding this feature is worth modifying tens of models. I make the following pact with y'all:
⚠️ if this obscure comment in a closed issue reaches 10 reactions, I will implement the change to prepare_inputs_for_generation
on all text-generation models. (Whoever does the 10th reaction, please tag me; cc @patrickvonplaten )
(Note: prior to v4.26, you have to replace past_key_values
by past
in the code below)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, MaxLengthCriteria, StoppingCriteriaList
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
text = "Hello world"
input_ids = tokenizer.encode(text, return_tensors="pt")
# Traditional way of generating text
outputs = model.generate(input_ids)
print("\ngenerate + input_ids:", tokenizer.decode(outputs[0], skip_special_tokens=True))
# Generating with decoder models from inputs_embeds
# Step 1: monkey patch "prepare_inputs_for_generation" to pass inputs_embeds when they are available
def prepare_inputs_for_generation(input_ids, past_key_values=None, **kwargs):
token_type_ids = kwargs.get("token_type_ids", None)
# only last token for inputs_ids if past_key_values is defined in kwargs
if past_key_values:
input_ids = input_ids[:, -1].unsqueeze(-1)
if token_type_ids is not None:
token_type_ids = token_type_ids[:, -1].unsqueeze(-1)
attention_mask = kwargs.get("attention_mask", None)
position_ids = kwargs.get("position_ids", None)
if attention_mask is not None and position_ids is None:
# create position_ids on the fly for batch generation
position_ids = attention_mask.long().cumsum(-1) - 1
position_ids.masked_fill_(attention_mask == 0, 1)
if past_key_values:
position_ids = position_ids[:, -1].unsqueeze(-1)
else:
position_ids = None
# !!!!!!!!!!!!!!!!!!! start: modified vs original, to pass inputs_embeds when they are available
if "inputs_embeds" in kwargs and past_key_values is None: # we only want to use them in the 1st generation step
model_inputs = {"inputs_embeds": inputs_embeds}
else:
model_inputs = {"input_ids": input_ids}
model_inputs.update({
"past_key_values": past_key_values,
"use_cache": kwargs.get("use_cache"),
"position_ids": position_ids,
"attention_mask": attention_mask,
"token_type_ids": token_type_ids,
})
return model_inputs
# !!!!!!!!!!!!!!!!!!! end: modified vs original, to pass inputs_embeds when they are available
model.prepare_inputs_for_generation = prepare_inputs_for_generation
# Step 2: prepare the inputs for the generation method manually and call it
inputs_embeds = model.transformer.wte(input_ids)
# empty input ids -> the output will NOT include the input prompt, but will generate the same text (because of
# inputs_embeds)
input_ids = torch.LongTensor([[model.config.bos_token_id]])
stopping_criteria = StoppingCriteriaList([MaxLengthCriteria(max_length=20)])
outputs = model.greedy_search(
input_ids, inputs_embeds=inputs_embeds, stopping_criteria=stopping_criteria, pad_token_id=model.config.eos_token_id
)
print("\ngreedy + inputs_embeds:", tokenizer.decode(outputs[0], skip_special_tokens=True))
@gante We hit ~10~ 11!
@gante In this way, it seems that only the first token is based on inputs_embeds.
Oh damn, this exceeded my expectations 🙈 Added to my todo list! Keep in mind that my queue is long at the moment, so this might take a few months.
@BugApe in the example above, only the first forward pass will have inputs_embeds
as input, but you can have more than one token there. If your target application requires manipulating inputs_embeds
at each generation step, then you'd need to monkey-patch prepare_inputs_for_generation
to embed the newly generated tokens and then manipulate it as you wish. That will not be included in the planned changes.
However, in theory, I could make generate
accept a dictionary of arbitrary functions to be applied to each input in prepare_inputs_for_generation
(e.g. a function that embeds input_ids
and then add some noise, to be applied at each step before the forward pass).
I'll make the same pact as above: if this comment reaches 10 reactions, I will add the functionality to my todo list. (whoever does the 10th reaction, please tag me)
Hi, how's the state of this feature? It would add a lot of flexibility to model construction
@gante The monkey patch works nice for model.greedy_search
and model.contrastive_search
, but cannot work with model.generate
, which has more utilities. Could you provide a monkey patch that works with model.generate
? Many thanks!
Hi @mkyl, @Ryul0rd, @XuanVuNguyen, and other participants in this thread 👋
As promised, inputs_embeds
can be passed to .generate()
with decoder-only models 💪 See the example below for reference. The caveat is that if you want it on other models (in addition to GPT2), you'll have to open a PR that makes the same changes as the ones to GPT2 in this PR to your model of choice :)
To access it, install the latest version: pip install --upgrade git+https://github.com/huggingface/transformers.git
EDIT: as of 2023-Mar-16, you can access this feature by installing v4.27
. There are a few models with soft-prompting enabled -- try running it and, if the model lacks support, you'll get an exception with instructions.
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
text = "Hello world"
input_ids = tokenizer.encode(text, return_tensors="pt")
# Traditional way of generating text
outputs = model.generate(input_ids)
print("\ngenerate + input_ids:", tokenizer.decode(outputs[0], skip_special_tokens=True))
# From inputs_embeds -- exact same output if you also pass `input_ids`. If you don't
# pass `input_ids`, you will get the same generated content but without the prompt
inputs_embeds = model.transformer.wte(input_ids)
outputs = model.generate(input_ids, inputs_embeds=inputs_embeds)
print("\ngenerate + inputs_embeds:", tokenizer.decode(outputs[0], skip_special_tokens=True))
@BugApe I'm closing this issue for now, as your request is kind of an advanced functionality. However, if it hits the 10 reacts, let me know -- I'll reopen this issue then! 🤗
@gante Hi, could you please add this feature to chatglm2 too, thanks a lot.
Hey @ws1993109 👋
I have low bandwidth at the moment. Would you like to open a PR with it and test it? It should be mostly like copying the changes on GPT2 from this PR, and confirming that it works.
Hi, Is there a way to compute the gradient of the logits produced by the generate function with respect to the input embeddings (inputs_embeds)?
@elitalobo it should work the same as if you're passing input_ids
and labels
, but in this case you would be passing input_embeds
and labels
:)
@gante thank you for your fast reply. The generate function has @no_grad decorator so I am unable to obtain the gradients of the logits corresponding to each output token via this function. My solution is to use the forward call multiple times to obtain the gradient of each output token with respect to input embeddings. Is there a more efficient way to obtain these gradients ?
@elitalobo generate
is non-differentiable, so you won't be able to train using it. I suggest you review our material for training models on our docs, and replace input_ids
by input_embeds
in your case.
Following our issues guidelines, we reserve GitHub issues for bugs in the repository and/or feature requests. For any other matters, we'd like to invite you to use our forum or our discord 🤗
🚀 Feature request
Currently
GenerationMixin.generate()
only acceptsinput_ids
but notinputs_embeds
. Therefore this method is not usable when custom input embeddings are required. In contrast, many models do acceptinputs_embeds
as input. Additionally, for models that have both an encoder and a decoder, it is not possible to runencoder.forward()
anddecoder.generate()
separately, becausegenerate()
does not acceptencoder_outputs
as input.Motivation
Having the flexibility to input
inputs_embeds
orencoder_outputs
is essential for many tasks. For example, the input can be the concatenation of a sequence of word embeddings and an image embedding or style embedding (of the same embedding size). I want to usegenerate()
with a T5 model fine-tuned for such as task, where the input sequence contains both word and non-word embeddings.