navervision / lincir

Official Pytorch implementation of LinCIR: Language-only Training of Zero-shot Composed Image Retrieval (CVPR 2024)
Other
100 stars 5 forks source link

Training datasets #9

Closed zyy0822 closed 7 months ago

zyy0822 commented 8 months ago

In the paper, CC3M and 2.47M StableDiffusion prompts are employed for training. However, in the released code, three datasets are adopted, so i want to know if the 'dataset3': 'Geonmo/midjourney-prompts-onlyonly' is used for training, i.e.,https://github.com/navervision/lincir/blob/28943db28b4f65d41dc2724b6e79596b0b8cc82d/loader.py#L219C19-L219C21

geonm commented 8 months ago

As described in our paper, we trained our models using only Dataset1 (GCC3M captions) and Dataset2 (SDP).

However, upon further evaluation, we found that incorporating Dataset3 resulted in a slight improvement in performance.

So, we included Dataset3 in the version of the code we released.

Using LinCIR, adding extra text datasets to train a model for CIR becomes remarkably straightforward. Therefore, if you aim to enhance the model further, we suggest browsing HuggingFace to discover suitable text datasets. Text datasets, being significantly easier to gather for machine learning than datasets from other modalities, are available in abundance.

As outlined in our paper, we strongly recommend seeking out text datasets rich in keywords, such as nouns. For instance, the SAM-LLaVA-Captions appears to be a good choice.