exif extraction for dataset creation

etaisella commented 1 year ago

Hi guys, thanks for sharing your code :) I'm trying to train 'exif/image CLIP model' as it is not currently released. The paper mentions that you use the YFC100M dataset and filter out images with less than 10 exif tags. My question is with which software did you extract the exif information? Different commands return slightly different outputs. Also, how should I format the exif info before feeding it to the encoder? Thank you

hellomuffin commented 1 year ago

No software is needed. YFCC100M will provide you with a metadata file that contains the EXIF info (I remember it just called yfcc_exif). Don’t directly extract EXIF from the downloaded image, though, cuz Flickr sometimes remove the metadata of users’ photo manually.

etaisella commented 1 year ago

Hi and thanks for the reply! Any chance you could share this file? I can't seem to find it online, it doesn't seem to be included in the S3 bucket. Also, do you format the exif string in any way? Many of the exif values use integers to represent specific attributes (such as 1=sRGB for Color Space) do you convert these to strings according to the specific attribute (i.e. convert 1 to sRGB), or leave them as is?

hellomuffin commented 1 year ago

Hi, One year ago when I downloaded those data I use: The metadata files (ver 1. + expansion pack) are accessible at: s3://mmcommons List files:s3cmd ls s3://mmcommons Download files: s3cmd get --recursive s3://mmcommons Not sure if these links are still valid. If they are invalid, let me know and I will try to find a way to share those big files with you.

About the EXIF information formatting, I don’t do any specific formatting. I think no matter how you format it, the entire EXIF string is still kind of out of distribution for the pretrained text encoder, so it should be good to let the network learn all the things about it. But I do spend some effort cleaning yfcc data, like filtering out those garbage values, since yfcc metadata is quite noisy and it will affect training greatly unless you train it in an extremely large scale.

hellomuffin / exif-as-language

exif extraction for dataset creation #6