Open aohenuo opened 4 days ago
Yes, you're right! The model accepts inputs embeds but it was not implemented for generate()
specifically. Seems like our tests were silently skipping it. I'll work on it and open a PR today :)
Btw, any reason why passing embeds is more preferred in VLMs than input ids with pixel values? Just for my curiosity
Yes, you're right! The model accepts inputs embeds but it was not implemented for
generate()
specifically. Seems like our tests were silently skipping it. I'll work on it and open a PR today :)Btw, any reason why passing embeds is more preferred in VLMs than input ids with pixel values? Just for my curiosity
First of all, thank you very much for helping to resolve this bug!
Recently, I have been working on research that utilizes PEFT to fine-tune multimodal models. This requires me to input only embeddings. My goal is to generate the desired results using the embeddings I input. For this, the generate function in the code needs to have the capability to perform autoregressive generation based on the input embeddings.
System Info
transformers
version: 4.45.2when I use inputs_embeds instead of input_ids, the idefics model's generate function return a error: """ You passed
inputs_embeds
to.generate()
, but the model class IdeficsForVisionText2Text doesn't have its forwarding implemented. See the GPT2 implementation for an example (Generate: decoder-only models can generate withinputs_embeds
by gante · Pull Request #21405 · hug), and feel free to open a PR with it! """ However, In IdeficsForVisionText2Text's defintation, I find the forward already have the inputs_embeds enabled. The following function is defined at line 1541 of the code:So why can't this code just use generate to generate it, I'd be very grateful if solve this problem 🙏
Who can help?
@zucchini-nlp @patrickvonplaten
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
It shouldn't crash