Open ghost opened 1 year ago
same question for cc3m
For Wukong dataset, we filtered the first 50 million images using Chinese-CLIP (Vit-B-16 model) and only kept samples with a visual-textual similarity score greater than 0.475. So, you will need to pair the captions with the corresponding images based on the image captions.
For CC3M, we will try to restore their original correspondence.
hi, I have downloaded the wukong data from the url provided in https://github.com/phellonchen/X-LLM/blob/main/README_DATA.md, the order of samples in CSV files is not consistent with the image id/name in JSON file, so how can l link them between original image urls and filtered image names? @MingLunHan @phellonchen