navervision / CompoDiff

Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)
https://huggingface.co/navervision/CompoDiff-Aesthetic
Apache License 2.0
80 stars 3 forks source link

How to retrieval using ComoDiff? #2

Open develop-productivity opened 1 year ago

develop-productivity commented 1 year ago

Thanks you for porpose this excellent work! But when I read the paper, I was very confused. When we need to complete the retrieval task, I don't know how to complete the retrieval through the result generated by denoising. However, the demo of this warehouse only calculates the features of the image, but does not complete the retrieval, so how are the indicators of the paper calculated? Can you help me? Thanks a lot.

geonm commented 1 year ago

Hi there.

https://github.com/navervision/CompoDiff/blob/50e5ff6a60d60ffcc5ea47b751989baba9ed2ee3/demo_search.py#L119 In the above line, you can get the composed image features.

https://github.com/navervision/CompoDiff/blob/50e5ff6a60d60ffcc5ea47b751989baba9ed2ee3/demo_search.py#L133-L147 and here you can retrieve top-K image urls using the composed image features.

Can you specify the meaning of the indicators of the paper?

develop-productivity commented 1 year ago

Hi there.

https://github.com/navervision/CompoDiff/blob/50e5ff6a60d60ffcc5ea47b751989baba9ed2ee3/demo_search.py#L119

In the above line, you can get the composed image features. https://github.com/navervision/CompoDiff/blob/50e5ff6a60d60ffcc5ea47b751989baba9ed2ee3/demo_search.py#L133-L147

and here you can retrieve top-K image urls using the composed image features. Can you specify the meaning of the indicators of the paper?

Thanks. I see how it works. the indicators of the paper means recall@k。 I didn't describe it clearly. I'm sorry

develop-productivity commented 1 year ago

Hi there.

https://github.com/navervision/CompoDiff/blob/50e5ff6a60d60ffcc5ea47b751989baba9ed2ee3/demo_search.py#L119

In the above line, you can get the composed image features. https://github.com/navervision/CompoDiff/blob/50e5ff6a60d60ffcc5ea47b751989baba9ed2ee3/demo_search.py#L133-L147

and here you can retrieve top-K image urls using the composed image features. Can you specify the meaning of the indicators of the paper?

But I still don't understand how it works on CIRR or fashion-IQ datasets. How to get the ground truth target image features using a diffusion model and caculate the recall metrics? Thanks a lot.

geonm commented 1 year ago

Yes sure. This demo doesn't have the evaluate protocol for benchmark datasets.

We have currently only released the demo code to show the generalization performance of CompoDiff.

develop-productivity commented 1 year ago

Yes sure. This demo doesn't have the evaluate protocol for benchmark datasets.

We have currently only released the demo code to show the generalization performance of CompoDiff.

Do you plan to release the test code about protocol for benchmark datasets in the future? I'll be looking forward to it.