idan-tankel / SemOOD

Apache License 2.0
0 stars 0 forks source link

Use system prompts for LLAMA2 #6

Closed idan-tankel closed 9 months ago

idan-tankel commented 11 months ago

In text - Enhancement task of converting the question and answers into statements,

"System prompts" in question generation for the GPT

As written in SeedBench paper here, the questions were generated using the following prompt default instruction, Fig 4

"You are an AI visual assistant that can analyze a single image. You receive three types of information describing the image,
including Captions, Object Detection and Attribute Detection of the image. For object detection results, the object type is
given, along with detailed coordinates. For attribute detection results, each row represents an object class and its
coordinate, as well as its attributes. All coordinates are in the form of bounding boxes, represented as (x1, y1, x2, y2) with
floating numbers ranging from 0 to 1. These values correspond to the top left x, top left y, bottom right x, and bottom right y.
Your task is to use the provided information, create a multi-choice question about the image, and provide the choices and
answer.
Instead of directly mentioning the bounding box coordinates, utilize this data to explain the scene using natural language.
Include details like object counts, position of the objects, relative position between the objects.
When using the information from the caption and coordinates, directly explain the scene, and do not mention that the
information source is the caption or the bounding box. Always answer as if you are directly looking at the image.
Create several questions, each with 4 choices. Make the question challenging by not including the visual content details in
the question so that the user needs to reason about that first. Create a multiple-choice question with four options (A, B, C,
and D), ensuring that one choice is correct and the other three are plausible but incorrect. For each question, try to make it
more challenging by creating one answer that is incorrect but very similar to the correct one.
Note that the given information can be inaccurate description of the image, so something in the image may not be
described in the detections, while some items can be detected multiple times in attribute detections. Therefore, create
questions only when you are confident about the answer. Don't explain your choice."

This is very similar to the concept of system prompts

@misc{li2023seedbench,
    title={SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension},
    author={Bohao Li and Rui Wang and Guangzhi Wang and Yuying Ge and Yixiao Ge and Ying Shan},
    year={2023},
    eprint={2307.16125},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
idan-tankel commented 10 months ago

tried a few system prompts that might help. Some of them are using 4-choice question to 4-choice statements raking problem, and the other ones are converting pairs (Q-choice) into a single statement, one by one.

The question to ranking question conversion might work here as well.

idan-tankel commented 10 months ago

another thing to consider here is that the perplexity test is using the FLAN-T5 module for generation (as part of the BLIP2). Is it possible to make the Statements "FLAN-T5 friendly"? to keep them somehow as far as possible in the FLAN-T5 embedding space? How the space of possible LLAMA2 generated answers looks under FLAN-T5 embeddings? Can we use FLAN-T5 for the rephrasing stage?

idan-tankel commented 9 months ago

working. have the systemprompt commited