Closed YUHANG-Ma closed 2 years ago
Hi , You need to put the enable text option to false
On Wed, Jun 1, 2022, 11:17 YUHANG-Ma @.***> wrote:
Hi, I meet a problem when using clip-retrieval. My dataset path is like this: /data1/train-{00000..00099}.tar. Each tar file contains .jpg and .cls which match each other. I want to use clip-retriecal to get img embedding. I run like this:
clip-retrieval inference --input_dataset /root/data0601/train-0001.tar --output_folder /root/npy0602 --input_format webdataset
I didn't encounter any issue but there is no img embed nor text embed in the output folder. Could I ask how I can fix it? [image: image] https://user-images.githubusercontent.com/72591225/171370929-43cd8d43-db23-469c-86aa-da1ff539607a.png
— Reply to this email directly, view it on GitHub https://github.com/rom1504/clip-retrieval/issues/153, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437SFIQHRA3XUQJYYBNDVM4TCZANCNFSM5XQOENXQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
It works. But when I run clip-retrieval inference --input_dataset /root/try_data/data0601/train-{0001..0002}.tar --output_folder /root/try_data/npy0602 --input_format webdataset --enable_text False I got only one npy file. Is that right? I thought I need two .npy to train the decoder of dalle2.
You need image and text to train a text to image model
But in the decoder part, I don't need to input text embedding. Text embedding is used to train the diffision prior model in my understanding. Also, I want to ask if I use the format such as data/train0001/25634.jpg to generate an img embedding.npy using file format and rename it as img_embed_0001.npy and then rename the folder as 0000.tar/00000001.jpg as it is needed to fit the rule of decoder dataset, will it affect the mapping relation between the pic and the embedding?
For the decoder you need image embeddings (or image embeddings predicted by the prior from the text) aligned with images
Hi, I read the readme and it is shown as below. My understanding is that training for diffusion prior and decoder parts is seperate. They combine with each other in the inference stage.
Yes, but you still need text and images and images embeddings for the decoder training if you follow the dalle2 paper
You may be able to use only images and images embeddings if you remove the text conditioning
Yes, it make sense.
Hi, I meet a problem when using clip-retrieval. My dataset path is like this: /data1/train-{00000..00099}.tar. Each tar file contains .jpg and .cls which match each other. I want to use clip-retriecal to get img embedding. I run like this:
clip-retrieval inference --input_dataset /root/data0601/train-0001.tar --output_folder /root/npy0602 --input_format webdataset
I didn't encounter any issue but there is no img embed nor text embed in the output folder. Could I ask how I can fix it?