Open CHY-1231 opened 3 weeks ago
Hi, example checkpoints can be found here: https://huggingface.co/LionheartzzZ/KEDs. The image_stream.pt and text_stream.pt are the example checkpoints, you can load them and perform inference.
Thanks a lot! By the way, in the inference, it looks like needing pic2word_model.pt, but the original paper's pretrain_model link has been dead. Can you share it or tell where it can dowmload?
Thanks a lot! By the way, in the inference, it looks like needing pic2word_model.pt, but the original paper's pretrain_model link has been dead. Can you share it or tell where it can dowmload?
You don't need the pic2word_model.pt. That was the checkpoint of pic2word. KEDs trains the two branch networks from scratch.
In eval_retrieval.py, the function "load_mode" will load "model, img2text, retrieval_fuse, text_condition, preprocess_val". model and preprocess_val is from ViT-L/14. img2text, retrieval_fuse, text_condition is from two streams.pt. Can I take it that way?
In eval_retrieval.py, the function "load_mode" will load "model, img2text, retrieval_fuse, text_condition, preprocess_val". model and preprocess_val is from ViT-L/14. img2text, retrieval_fuse, text_condition is from two streams.pt. Can I take it that way?
Actually, during training, we save the retrieval fuse part and the projection part in the same checkpoint, so each of the example checkpoints I provided contains the individual img2text part trained from scratch. You may check out the function of loading checkpoint in https://github.com/suoych/KEDs/blob/dfe214015f68d731871a4db5e8f3b5d05680b268/src/eval_utils.py#L59-L86.
This database_names.txt should correspond to cc_image_databases.pt and cc_text_databases.pt, right?
Can you share the img2text model.pth. I want to use this work for inference on other datasets.