hellomuffin / exif-as-language

official repo for the paper "EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata"
MIT License
39 stars 4 forks source link

Release of the CLIP weights? #5

Closed teboli closed 1 year ago

teboli commented 1 year ago

Hi!

Really interesting work! Any plan to release the pre-trained weights of your CLIP model now that CVPR has past?

Thanks!

Thomas

hellomuffin commented 1 year ago

Thank you for your interest in our work! I am sorry about didn’t check the repo on vacation and missed this issue. I have released a version of pre-trained weight. Let me know if you have further questions.

hellomuffin commented 1 year ago

Hi, recently we found that because the checkpoint we uploaded is not a full model( trained for 48k steps, while as said in the paper the full model is trained for ~73k steps), the performance is a bit worse than the numbers in paper. Sadly, the original full model and data are accidentally auto-cleaned by cluster due to long-time no access. To make up, we temporarily uploaded another full model that is trained for 75k steps in another 1.5M random sample of the yfcc100m dataset. Qualitatively its performance is extremely similar to original full model, quantitatively there is a little difference, perhaps due to variance of training data.

Specifically, the performance for this version of full model in Columbia and DSO is as follows: Columbia: mAP: 0.93 cIoU: 0.88 DSO: mAP: 0.65 cIoU: 0.80

Thanks again for raising this issue. We are looking into regenerating the deleted data and training the model for full length to replicate the results specified in the paper. We will get back to you soon. Sorry for the inconvenience.