Closed LeoXing1996 closed 1 year ago
it's in the same order as the metadata there https://huggingface.co/datasets/laion/laion2b-en-vit-l-14-embeddings/tree/main/metadata
@rom1504, thanks for your answer!
I apologize if my question was unclear. I am seeking guidance on how to align the metadata for CLIP embeddings with the metadata for LAION-2b-en. e.g.
Can you provide any assistance with this matter?
What is the purpose of doing that?
The metadata next to the embeddings is also the laion2B metadata but in a different order
@rom1504, I have already downloaded the LAION-2b-en and converted it to webdataset's style.
If I want to load the pre-computed CLIP embedding with the image during training, one good practice is to resort the CLIP embedding file and make it align with the images in tar files.
Ok see https://github.com/lucidrains/DALLE2-pytorch#decoder-image-embedding-dataset
And in particular https://github.com/Veldrovive/embedding-dataset-reordering
On Sun, May 14, 2023, 09:53 LeoXing1996 @.***> wrote:
@rom1504 https://github.com/rom1504, I have already downloaded the LAION-2b-en and converted it to webdataset's style.
If I want to load the pre-computed CLIP embedding with the image during training, one good practice is to resort the CLIP embedding file and make it align with the images in tar files.
— Reply to this email directly, view it on GitHub https://github.com/rom1504/clip-retrieval/issues/268#issuecomment-1546833856, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437UVNXBSBHYTRVYRLH3XGCFP3ANCNFSM6AAAAAAX7SM43M . You are receiving this because you were mentioned.Message ID: @.***>
did u solve this? @LeoXing1996
did u solve this? @LeoXing1996
@DJLee68, no, I recalculate all embeddings by myself 😢
did u solve this? @LeoXing1996
@DJLee68, no, I recalculate all embeddings by myself 😢
I'm having the same problem.
Ok see https://github.com/lucidrains/DALLE2-pytorch#decoder-image-embedding-dataset
And in particular https://github.com/Veldrovive/embedding-dataset-reordering
I tried using these repos from above which @rom1504 mentioned, but they didn't work for us.
so did u match whole embedding of laion-2b with laion-2b img webdataset using meta data of laion-2b?
I have already downloaded the LAION 5B dataset from https://huggingface.co/datasets/laion/laion2B-en, and now I want to use the pre-computed CLIP embedding in https://huggingface.co/datasets/laion/laion2b-en-vit-l-14-embeddings/tree/main.
However, I found the metadata (or image order?) in those two repos are mismatched. How can we map the clip embedding to the downloaded laion2b-en dataset?