Open tungdop2 opened 1 year ago
Hi, you may decide to join the caption in a single column and then use save_additional_columns option to put them in the json file next to images
@rom1504 thank for your reply. So download multiple times in MSCOCO is default setting?
@tungdop2 it seems this metadata parquet file has this issue do you want to fix it and upload a better version to huggingface?
In MSCOCO or Visual Gnome, an image has more than 1 caption, so img2dataset will download it 3 or 4 times. How to solve this problem?