suoych / KEDs

Implementation of the paper Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval (CVPR 2024)
MIT License
13 stars 0 forks source link

Issues about inference #5

Open CHY-1231 opened 3 weeks ago

CHY-1231 commented 3 weeks ago

Can you share the img2text model.pth. I want to use this work for inference on other datasets.

suoych commented 2 weeks ago

Hi, example checkpoints can be found here: https://huggingface.co/LionheartzzZ/KEDs. The image_stream.pt and text_stream.pt are the example checkpoints, you can load them and perform inference.

CHY-1231 commented 2 weeks ago

Thanks a lot! By the way, in the inference, it looks like needing pic2word_model.pt, but the original paper's pretrain_model link has been dead. Can you share it or tell where it can dowmload?

suoych commented 2 weeks ago

Thanks a lot! By the way, in the inference, it looks like needing pic2word_model.pt, but the original paper's pretrain_model link has been dead. Can you share it or tell where it can dowmload?

You don't need the pic2word_model.pt. That was the checkpoint of pic2word. KEDs trains the two branch networks from scratch.

CHY-1231 commented 2 weeks ago

image In eval_retrieval.py, the function "load_mode" will load "model, img2text, retrieval_fuse, text_condition, preprocess_val". model and preprocess_val is from ViT-L/14. img2text, retrieval_fuse, text_condition is from two streams.pt. Can I take it that way?

suoych commented 2 weeks ago

image In eval_retrieval.py, the function "load_mode" will load "model, img2text, retrieval_fuse, text_condition, preprocess_val". model and preprocess_val is from ViT-L/14. img2text, retrieval_fuse, text_condition is from two streams.pt. Can I take it that way?

Actually, during training, we save the retrieval fuse part and the projection part in the same checkpoint, so each of the example checkpoints I provided contains the individual img2text part trained from scratch. You may check out the function of loading checkpoint in https://github.com/suoych/KEDs/blob/dfe214015f68d731871a4db5e8f3b5d05680b268/src/eval_utils.py#L59-L86.

CHY-1231 commented 2 weeks ago

image This database_names.txt should correspond to cc_image_databases.pt and cc_text_databases.pt, right?