rabiulcste / vqazero

visual question answering prompting recipes for large vision-language models
https://rabiul.me/vqazero/
19 stars 2 forks source link

Code Reproduction #12

Open lynnjeoun opened 5 months ago

lynnjeoun commented 5 months ago

Hello, author. Thank you for sharing your wonderful work.

While I was reproducing your code, I realized that 'main_v2.py' has not been updated yet unlike the implementation code you wrote in READ.me. Seems like there are many changes in the code for arguments. Is there any plan you might share 'main_v2.py'?

Thank you.

Lynn

lynnjeoun commented 5 months ago

Plus, there are some broken parts as well while importing such as no 'utils.config' I assume this is happening because the 'utils' in the current repository is up-to-date but the 'main.py' is not?

Thanks.

rabiulcste commented 5 months ago

You should use main.py and that should work fine. Let me know if there's any error.

lynnjeoun commented 5 months ago

Thank you for your answer.

Like the screenshots I attached below, some files such as 'common.py' are looking for 'utils.config' but there's no config file under utils. That is the part I am saying somehow the code is broken. Would you please check?

스크린샷, 2024-04-12 11-14-02 스크린샷, 2024-04-12 11-17-25

rabiulcste commented 5 months ago

You're encountering an error due to a missing file named utils/config.py in your local repository. This file likely contains important path constants.

Solution

VQA_DATASET_DIR = "/path/to/your/VQA_dataset"
COCO_DATASET_DIR = "/path/to/your/COCO_dataset"
OUTPUT_DIR = "/path/to/your/output/directory"
VQA_BOOTSTRAP_DIR = "/path/to/your/VQA_bootstrap/directory"
HUGGINFACE_HUB_DIR = "/path/to/your/HuggingFace_Hub/directory"
PROJECT_DIR = "/path/to/your/project/directory"
pdu3 commented 5 months ago

You're encountering an error due to a missing file named utils/config.py in your local repository. This file likely contains important path constants.

Solution

  • Create a new file named utils/config.py in your local repository.
  • Populate the file with the following content, replacing the placeholders with the actual directory paths on your system:
VQA_DATASET_DIR = "/path/to/your/VQA_dataset"
COCO_DATASET_DIR = "/path/to/your/COCO_dataset"
OUTPUT_DIR = "/path/to/your/output/directory"
VQA_BOOTSTRAP_DIR = "/path/to/your/VQA_bootstrap/directory"
HUGGINFACE_HUB_DIR = "/path/to/your/HuggingFace_Hub/directory"
PROJECT_DIR = "/path/to/your/project/directory"

Thank your for your work! I also have some bugs when I reproduce your codes. When I try to run the sample code python3 main.py --dataset_name okvqa --model_name blip2_t5_flant5xxl --vqa_format caption_vqa --prompt_name prefix_your_task_knowledge_qa_short_answer,prefix_promptcap I got a bug : No such file or directory: 'vqa_bootstrap/demonstrations/dense_captioning.json' It seems like the dense_captioning.json is missing in your codes. And I don't find such directory in your codes

In addition, when I try to run cot-VQA, it needs two prompt_names like caption-VQA. I check your codes, it needs two prompt_names, but I don't know why. Would you please explain it? I think you use two prompt names for caption-vqa because you treat the second one as additional cue. And you also show only one prompt_name for cot-vqa in your readme file. To run the Chain-of-thought VQA, use the following command:

python3 main.py --dataset_name okvqa --model_name blip2_t5_flant5xxl --vqa_format cot_vqa --prompt_name prefix_think_step_by_step_rationale

rabiulcste commented 5 months ago

@pdu3 Could you please provide the full error trace?

rabiulcste commented 5 months ago

python3 main.py --dataset_name okvqa --model_name blip2_t5_flant5xxl --vqa_format caption_vqa --prompt_name prefix_your_task_knowledge_qa_short_answer,prefix_promptcap I got a bug : No such file or directory: 'vqa_bootstrap/demonstrations/dense_captioning.json' It seems like the dense_captioning.json is missing in your codes. And I don't find such directory in your codes

@pdu3 Fixed the issue and should be working now.

To answer your second query. There are two ways to use Chain-of-thought (CoT) prompting:

pdu3 commented 4 months ago

python3 main.py --dataset_name okvqa --model_name blip2_t5_flant5xxl --vqa_format caption_vqa --prompt_name prefix_your_task_knowledge_qa_short_answer,prefix_promptcap I got a bug : No such file or directory: 'vqa_bootstrap/demonstrations/dense_captioning.json' It seems like the dense_captioning.json is missing in your codes. And I don't find such directory in your codes

@pdu3 Fixed the issue and should be working now.

To answer your second query. There are two ways to use Chain-of-thought (CoT) prompting:

  • Single-prompt: Simply provide the name of the prompt you want to use, a standard use case.
  • Few-shot: Provide a list of prompts for the CoT scenario. CoT caches demonstrations first, so you'll need to provide two prompts initially.

Thank you for your reply. But I still have questions about it. I will appreciate it if you are willing to answer my questions. 1) First, I am sure single-prompt for cot-vqa will show errors: ValueError: Prompt_name should be a list of two prompts for caption_vqa format. Got prefix_rationale_before_answering I know it is due to the code in inference_utils line 223-224 elif args.vqa_format == "cot_vqa": if len(prompt_name) != 2: raise ValueError(f"Prompt_name should be a list of two prompts for caption_vqa format. Got {prompt_name}") But even if I comment out these codes, another error popped up ValueError: ERROR! Unsupported combination of model_class (hfformer), model_name (blip2_flant5xxl), and dataset_name (okvqa) I am very confused about this error. Because when I run python3 main.py --dataset_name okvqa --model_name blip2_flant5xxl --vqa_format cot_vqa --prompt_name prefix_your_task_knowledge_qa_short_answer,prefix_rationale_before_answering This error doesn't appear. However, if I choose two templates randomly. Sometimes this error will occur again.

2) I have no idea why the result that I got is very different from the results in your paper. For example, for table 6, your best result for BF(XXL) on OKVQA is 42.12. However, one result I got from a combination of prefix_your_task_knowledge_qa_short_answer,prefix_rationale_before_answering is 47.7.

3) And when I try to run your codes on vqa_v2, I forgot to add the images into directory. However, the codes can also run successfully, and show an accuracy of 62.97. That's very strange. How can predict the result without looking at the image?

4) Would you please explain a little more about CoT-iterative and Cot-context. For example, which template is iterative and which one is context. Sorry I am a newbie, although I know their formats, but I can't distinguish from their templates. I need some CoT prompts for Blip2 to ask VQA on my own dataset with high accuracy. That's why I am very interested in your work. Any suggestions from you are also appreciated!

rabiulcste commented 4 months ago

Let me give you a quick answer to your question about 2 comma-separated prompts (1, 2 and 4). If you want to use Chain-of-thought prompting directly, you should be using a single prompt (Standard-VQA). The two prompts scenario is actually for the "CaptionVQA" setting in the paper, so it assumes one prompt is given for captioning, and the other one for the VQA task. You shouldn't randomly pair prompt templates!

Now, you're right that the "CoTVQA" is confusing in the script. It is essentially for CoT-Iterative and Cot-Context. In this case, we'll first use a chain-of-though prompt to first generate the model chain-of-though outputs and then feed parts of it as input to the model to get the final answer. That's why you need two CoT prompts. This essentially treats the first Chain-of-thought as additional context as we're doing for captioning VQA too. I'll also update the readme to make it clear.

I have to check the raised issue about 3 in the code and will get back to you.