Evaluation on GeneCIS benchmark

sungonce commented 8 months ago

According to the README, the code for evaluating the GeneCIS benchmark is located in a branch named eval_genecis. However, I could not find this specific branch upon checking your repository.

GeneCIS

Evaluating GeneCIS requires a few additional steps. Check out the eval_genecis branch and make the necessary adjustments to the configuration in ./eval_genecis/config.py. Then, run the following script:

$ cd eval_genecis

$ python evaluate.py \
--combiner_mode phi \
--model large \
--combiner_pretrain_path /path/to/trained_your/phi_best.pt

If this branch hasn't been uploaded, could you make it available?

geonm commented 8 months ago

Sure.

We will add modified codes on master branch to validate a model easily with GeneCIS benchmark this week.

sungonce commented 8 months ago

Can I get some updates?

geonm commented 8 months ago

I was busy so it's taking a bit longer.

I'll upload it by this weekend.

geonm commented 8 months ago

Thank you for your patience!

We've uploaded the eval_genecis branch for the purpose of evaluating trained phi projectors with the Genecis benchmark.

Before proceeding with the evaluation, it's essential to prepare the necessary evaluation datasets. For detailed instructions and download links, please visit: https://github.com/facebookresearch/genecis?tab=readme-ov-file#-arrow_down-downloads

The datasets required for this evaluation are VG 100K and COCO val 2017.

By following the instructions below, you'll be able to obtain the evaluation results.

# Assuming you're in the lincir folder.
$ git fetch --all
$ git checkout eval_genecis
$ cd genecis
$ python evaluate.py \
    --combiner_mode phi \
    --model large \
    --combiner_pretrain_path /path/to/lincir_best.pt \
        --vg_100k_all_path /path/to/VG_100K_all \
        --coco_val2017_path /path/to/val2017

sungonce commented 8 months ago

I'll check it out right away! Thanks again for releasing the codes for this amazing work 😄

sungonce commented 7 months ago

Thanks for update again!

I successfully ran the updated GeneCIS evaluation with your update. It had some missing files and shape errors, but worked fine after simple fixes.

From there, I found some issues and reopened the issue.

I found a large discrepancy between the results in paper (Table B.3. in LinCIR arXiv paper) and our reproduced results in the GeneCIS Change Attribute experiment. The ViT-L reference paper lists a performance of (R@1-R@3: 16.19-36.84), but the performance was significantly different for both the weight you uploaded (R@1-R@3: 12.07-31.16) and the weight I reproduced (R@1-R@3: 11.51-31.25).

ViT-L	Average	Average	Average	Focus Attribute	Focus Attribute	Focus Attribute	Change Attribute	Change Attribute	Change Attribute	Focus Object	Focus Object	Focus Object	Change Object	Change Object	Change Object
	R@1	R@2	R@3	R@1	R@2	R@3	R@1	R@2	R@3	R@1	R@2	R@3	R@1	R@2	R@3
Pic2Word (Paper)	11.16	21.47	30.38	15.65	28.16	38.65	13.87	24.67	33.05	8.42	18.01	25.77	6.68	15.05	24.03
SEARLE (Paper)	12.26	22.11	31.30	17.00	29.65	40.70	16.38	25.28	34.14	7.76	16.68	25.31	7.91	16.84	25.05
LinCIR (Paper)	12.19	22.76	32.38	16.90	29.95	41.45	16.19	27.98	36.84	8.27	17.40	26.22	7.40	15.71	25.00
LinCIR (huggingface)	11.34	21.22	30.88	17.10	29.35	41.75	12.07	22.16	31.16	8.21	17.76	25.66	7.96	15.61	24.95
LinCIR (our reproduce)	11.01	21.24	30.50	17.20	30.60	41.65	11.51	21.54	31.25	7.55	16.99	25.31	7.76	15.82	23.78

Titles of columns in Table B.4. seem to have been switched. When I backtracked the calculation, it looked like there was probably a mismatch between Change Avg and Object Avg. (For example, Pic2Word (ViT-L) has an R@1 Change Avg of (13.87+6.68)/2 = 10.28, but this value is actually the average value of Object results.)

Can you comment on the above issues? I think it's possible that these things may have changed during the paper rebuttal, and if so, could you provide a brief update?

geonm commented 7 months ago

It seems weird.

These are our results with LinCIR HF Model:

user@machine:/path/to/lincir/genecis# CUDA_VISIBLE_DEVICES=0 python evaluate.py \
> --combiner_mode phi \
> --model large \
> --combiner_pretrain_path /path/to/lincir_large.pt \
> --vg_100k_all_path /path/to/VG_100K_all \
> --coco_val2017_path /path/to/val2017
[INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
batch_size_per_gpu: 8
clip_pretrain_path: None
coco_val2017_path: /path/to/val2017
combiner_mode: phi
combiner_pretrain_path: /path/to/lincir_large.pt
dist_url: env://
feature_comb_average: 0.5
local_rank: 0
model: large
num_workers: 8
pred_save_path: None
use_complete_text_query: False
vg_100k_all_path: /path/to/VG_100K_all
Loading models...
Loading datasets...
Evaluating on GeneCIS from ./genecis/change_attribute.json
Evaluating on 2112 templates...
Computing eval with combiner...

Recall @ 1 = 0.1605
Recall @ 2 = 0.2822
Recall @ 3 = 0.3674

Note that we have removed some lines from logs for security reasons.

geonm commented 7 months ago

Ah...

We've missed ./genecis/datasets folder.

Could you check it again on the latest eval_genecis branch?

sungonce commented 7 months ago

Thanks for the fast reply :D Everything is now resolved.

Before you upload the dataset folder, I simply pulled the missing files from the original GeneCIS repo and modified them. In doing so, the configuration within the batch of the dataloader was different from yours. (The last "caption" variable was not present in the original GeneCIS branch)

https://github.com/navervision/lincir/blob/e6727d9a50bd97921e147bf2af8120187b105ab3/genecis/datasets/vaw_dataset.py#L113

This caused the "string sentence" variable to be missing with a value of 0, resulting in an error in LinCIR's input sentence construction.

https://github.com/navervision/lincir/blob/e6727d9a50bd97921e147bf2af8120187b105ab3/genecis/eval_functions.py#L196

Now my results are almost the same as yours. (R@1=16.00, R@2=28.17, R@3=36.93) Thank you again for your response, and congratulations on your CVPR acceptance! 🥂

navervision / lincir

Evaluation on GeneCIS benchmark #10

GeneCIS