π Homepage | πΌοΈ Dataset | π€ HuggingFace
INQUIRE is a benchmark for expert-level natural world image retrieval queries.
Please note that this repository is preliminary. Both the code and dataset will be updated.
The INQUIRE benchmark and the iNaturalist 2024 dataset (iNat24) are available for public download. Please find information and download links here.
If you'd like, you can create a new environment in which to set up the repo:
conda create -n inquire python=3.10
conda activate inquire
Then, install the dependencies:
pip install -r requirements.txt
Our evaluations use pre-computed CLIP embeddings over iNat24. If you'd like to replicate our evaluations or just work with these embeddings, please download them here.
INQUIRE-Fullrank is the full-dataset retrieval task, starting from all 5 million images of iNat24. We evaluate one-stage retrieval, using similarity search with CLIP-style models, and two-stage retrieval, where after the initial retrieval, a large multi-modal model is used to rerank the images.
To evaluate full-dataset retrieval with different CLIP-style models, you don't necessarily need all 5 million images, but rather their embeddings. You can download our pre-computed embeddings for a variety of models from here. Then, use the following command to evaluate CLIP retrieval:
python src/eval_fullrank.py --split test --k 50
After the first stage, we can use large multi-modal models to re-rank the top k retrievals to improve results. This stage requires access to the iNat24 images, which you can download here. To run the second stage retrieval, use the following command:
python src/eval_fullrank_two_stage.py --split test --k 50
We recommend starting with INQUIRE-Rerank, as it is much smaller and easier to work with. INQUIRE-Rerank is available on π€ HuggingFace!
INQUIRE-Rerank evaluates reranking performance by fixing an initial retrieval of 100 images for each query (from OpenClip's CLIP ViT-H-14-378). For each query (e.g. A mongoose standing upright alert), your task is to re-order the 100 images so that more of the relevant images are at the "top" of the reranked order.
There are no extra requirements for evaluating INQUIRE-Rerank! The data will automatically download from HuggingFace if you don't already have it.
Evaluate reranking performance with large multi-modal models such as LLaVA-34B:
python src/eval_rerank_with_llm.py --split test
Since inference can take a long time, we've pre-computed the outputs for all large multi-modal models we work with! You can download these here.