Open htluandc2 opened 8 months ago
In-context learning or fine tuning
That's an excellent question. Similar to OpenAI GPT models, we can enhance them through a few-shot approach. It would be fantastic if we could apply the same method to these pre-trained models. @haotian-liu
Is it solved? Because I use SGLang for batch inference and I also need this feature for ICL and multiple discussions or few shot.
image{len(templates)}
Describe the issue
Hi there,
I have some images and some custom explain. So I want to implement few shot learning to make summaries of my images.
This is my current implement:
templates = [ { "url": "", "explain": """""", }, { "url": "", "explain": """""", }, { "url": "", "explain": """""" }, { "url": ", "explain": """""" }, { "url": "", "explain": """""" }, ]
My code to build prompt:
from PIL import Image import cv2 import numpy as np import requests """Make image summary""" img_prompt = "User: <image>\n"+"\nASSISTANT:" prompt = ( "You are an assistant tasked with summarizing images for retrieval. " "These summaries will be embedded and used to retrieve the raw image. " "Give a concise summary of the image that is well optimized for retrieval." ) print(prompt) images = [] for i, temp in enumerate(templates): image_i = Image.open(requests.get(temp['url'], stream=True).raw) eplain_i = temp["explain"] example_i = f"\nUser: <image{i}>"+"\nASSISTANT:" + eplain_i + "\n" prompt += example_i images.append(image_i) prompt += f"\nUser: <image{len(templates)}>"+"\nASSISTANT:" print(prompt) print('-'*100) print("Examples:", len(images))
Inference:
target = Image.open("figures/figure-2-5.jpg") out = model_multi_modals( images=images+[target], prompt=prompt, generate_kwargs={"max_new_tokens": 2048})
And my error:
ValueError: The input provided to the model are wrong. The number of image tokens is 0 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.
I think The error is because of the image token. In the prompt, the image token should be given as:
<image>
and not by image id or image index. I got a similar error in my setup for multi-prompt.
BTW, the model is not capable of performing directly on multiple images and prompts simultaneously, as is evident from the following conversations by the author and others.
https://discuss.huggingface.co/t/llava-multi-image-input-support-for-inference/68458
Hi guys, you can use our implemented codebase for ICL. https://github.com/ys-zong/VL-ICL
Describe the issue
Hi there,
I have some images and some custom explain. So I want to implement few shot learning to make summaries of my images.
This is my current implement:
My code to build prompt:
Inference:
And my error: