GrandD Detailed Operation Guide

closed 8 months ago

hzdzkjdxyjs commented 8 months ago

Your work is of great academic value and significance, and I am very grateful for the contributions you have made. I would like to ask you about the specific operational steps for implementing the GranD Automated Annotation Pipeline. I am very grateful that you could take the time out of your busy schedule to look at my question.

mmaaz60 commented 8 months ago

Hi @hzdzkjdxyjs,

Thank you for your interest in our work. The GranD creation pipeline is released and all the source codes are available at mbzuai-oryx/groundingLMM/tree/main/GranD. Please refer to docs/GranD.md for more details.

Please let me know if you have any questions. Thank You.

hzdzkjdxyjs commented 8 months ago

2024-03-22 191855 Thank you very much for answering my question so quickly. I think the document you provided earlier about how to run this code was very detailed and very suitable for us to quickly apply your code. I wonder if you could also discuss the parameter settings and other related issues in detail, just like on this page?

mmaaz60 commented 8 months ago

Thank You @hzdzkjdxyjs,

If I understood your question correctly, you are interested in running the GranD Automated Annotation pipeline from scratch.

Note that the annotations pipeline contains 4 levels and in total 23 steps. In each level we ran multiple SoTA vision-language models and pipeline scripts to build image-scene graphs out of raw predictions.

The process is detailed in our paper and the script run_pipeline.sh provides step-by-step guide to implement/run the pipeline. The corresponding environments used can be found at environments.

Please go through the run_pipeline.sh script thoroughly and let me know if you have any questions. I hope it will help.

hzdzkjdxyjs commented 8 months ago

I'm very sorry to take up your valuable academic time. I would like to ask how I should set the parameters here, and what kind of command I should use to run the code. 2024-03-22 194228

mmaaz60 commented 8 months ago

Hi @hzdzkjdxyjs,

That's a bash script that takes few command line arguments as detailed below,

  1. IMG_DIR -> path to the directory containing images on which you want to run the pipeline
  2. PRED_DIR -> path to the directory where the predictions will be saved
  3. CKPT_DIR -> path to the directory containing all the checkpoints. For downloading the checkpoints you have to consult the README of each respective model.
  4. SAM_ANNOTATIONS_DIR -> path to the directory containing SAM annotations (.json file)

First you have to create all the environments listed in environments. For example,

conda create --name grand_env_1 --file requirements_grand_env_1.txt
conda create --name grand_env_2 --file requirements_grand_env_2.txt
conda create --name grand_env_9 --file requirements_grand_env_9.txt
conda create --name grand_env_utils --file requirements_grand_env_utils.txt

Second, you have to download all the checkpoints in your CKPT_DIR direcotry.

# For Landmark detection
git lfs install
git clone https://huggingface.co/liuhaotian/llava-v1-0719-336px-lora-merge-vicuna-13b-v1.3

# For Depth Estimation
wget https://github.com/isl-org/MiDaS/releases/download/v3_1/dpt_beit_large_512.pt

# For Image Tagging
Download it from [recognize-anything/tag2text_swin_14m.pth](https://huggingface.co/spaces/xinyu1205/recognize-anything/blob/main/tag2text_swin_14m.pth) & [recognize-anything/ram_swin_large_14m.pth](https://huggingface.co/spaces/xinyu1205/recognize-anything/blob/main/ram_swin_large_14m.pth)

# For Co-DETR Detector
Please use [google drive link](https://drive.google.com/drive/folders/1asWoZ3SuM6APTL9D-QUF_YW9mjULNdh9?usp=sharing) to download `co_deformable_detr_swin_large_900q_3x_coco.pth` checkpoints.

# For EVA-02 Detector
Download it from [eva02_L_lvis_sys.pth](https://huggingface.co/Yuxin-CV/EVA-02/blob/main/eva02/det/eva02_L_lvis_sys.pth) & [eva02_L_lvis_sys_o365.pth](https://huggingface.co/Yuxin-CV/EVA-02/blob/main/eva02/det/eva02_L_lvis_sys_o365.pth)

# For POMP
Download it from [vit_b16_ep20_randaug2_unc1000_16shots_nctx16_cscFalse_ctpend_seed42.pth.tar](https://drive.google.com/file/d/1C8oU6cWkJdU3Q3IHaqTcbIToRLo9bMnu/view?usp=sharing) & [Detic_LI_CLIP_R5021k_640b64_4x_ft4x_max-size_pomp.pth](https://drive.google.com/file/d/1TwrjcUYimkI_f9z9UZXCmLztdgv31Peu/view?usp=sharing)

# For GRIP
wget -c https://datarelease.blob.core.windows.net/grit/models/grit_b_densecap_objectdet.pth

# For OV-SAM
Download it from [HarborYuan/ovsam_models/blob/main/sam2clip_vith_rn50x16.pth](https://huggingface.co/HarborYuan/ovsam_models/blob/main/sam2clip_vith_rn50x16.pth)

# For GPT4RoI
Follow the instructions at [GPT4RoI/Weights](https://github.com/jshilong/GPT4RoI?tab=readme-ov-file#weights) to get GPT4RoI weights.

Third you need to have some images in the IMG_DIR. Fourth if you are running on SAM images, you have to prepare SAM_ANNOTATIONS_DIR containing SAM json files, otherwise you may skip it and remove ov-sam from the pipeline and adjust add_masks_to_annotations.py script accordingly. Finally you can run 'run_pipeline.sh' script using the following command.

bash run_pipeline.sh <path to the directory containing images> <path to the directory for storing predictions> <checkpoints directory path> <path to the directory containing SAM annotations.>

I agree that the pipeline is not straightforward and this is because it involves running many off-the-shelf models that have different dependencies. We will welcome any pull request improving the pipeline.

Thanks and Good Luck :)

hzdzkjdxyjs commented 8 months ago

Thank you very much for taking the time out of your busy schedule to reply to my question. I find this project to be a very interesting and meaningful endeavor. I am amazed at the extremely excellent results produced by this demo. I will continue to follow your latest progress and actively try using your model. Once again, I express my respect to you and your team.

sankalpsinha-cmos commented 7 months ago

Hi @mmaaz60, I am unable to build the requirements for the dataset via the command you suggested : conda create --name grand_env_1 --file requirements_grand_env_1.txt. I get the error:

Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

I think its a channels issue as a normal conda environment yml file has a section defining the channels for the packages. Your help in reproducing the environments for the dataset is much appreciated.