How to use clip-retrievel on my own dataset?

YUHANG-Ma commented 2 years ago

Hi, I meet a problem when using clip-retrieval. My dataset path is like this: /data1/train-{00000..00099}.tar. Each tar file contains .jpg and .cls which match each other. I want to use clip-retriecal to get img embedding. I run like this:

clip-retrieval inference --input_dataset /root/data0601/train-0001.tar --output_folder /root/npy0602 --input_format webdataset

I didn't encounter any issue but there is no img embed nor text embed in the output folder. Could I ask how I can fix it?

rom1504 commented 2 years ago

Hi , You need to put the enable text option to false

On Wed, Jun 1, 2022, 11:17 YUHANG-Ma @.***> wrote:

Hi, I meet a problem when using clip-retrieval. My dataset path is like this: /data1/train-{00000..00099}.tar. Each tar file contains .jpg and .cls which match each other. I want to use clip-retriecal to get img embedding. I run like this:

clip-retrieval inference --input_dataset /root/data0601/train-0001.tar --output_folder /root/npy0602 --input_format webdataset

I didn't encounter any issue but there is no img embed nor text embed in the output folder. Could I ask how I can fix it? [image: image] https://user-images.githubusercontent.com/72591225/171370929-43cd8d43-db23-469c-86aa-da1ff539607a.png

— Reply to this email directly, view it on GitHub https://github.com/rom1504/clip-retrieval/issues/153, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437SFIQHRA3XUQJYYBNDVM4TCZANCNFSM5XQOENXQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

YUHANG-Ma commented 2 years ago

It works. But when I run clip-retrieval inference --input_dataset /root/try_data/data0601/train-{0001..0002}.tar --output_folder /root/try_data/npy0602 --input_format webdataset --enable_text False I got only one npy file. Is that right? I thought I need two .npy to train the decoder of dalle2.

rom1504 commented 2 years ago

You need image and text to train a text to image model

YUHANG-Ma commented 2 years ago

But in the decoder part, I don't need to input text embedding. Text embedding is used to train the diffision prior model in my understanding. Also, I want to ask if I use the format such as data/train0001/25634.jpg to generate an img embedding.npy using file format and rename it as img_embed_0001.npy and then rename the folder as 0000.tar/00000001.jpg as it is needed to fit the rule of decoder dataset, will it affect the mapping relation between the pic and the embedding?

rom1504 commented 2 years ago

For the decoder you need image embeddings (or image embeddings predicted by the prior from the text) aligned with images

YUHANG-Ma commented 2 years ago

Hi, I read the readme and it is shown as below. My understanding is that training for diffusion prior and decoder parts is seperate. They combine with each other in the inference stage.

rom1504 commented 2 years ago

Yes, but you still need text and images and images embeddings for the decoder training if you follow the dalle2 paper

You may be able to use only images and images embeddings if you remove the text conditioning

YUHANG-Ma commented 2 years ago

Yes, it make sense.

rom1504 / clip-retrieval

How to use clip-retrievel on my own dataset? #153