wisdomikezogwo / quilt1m

[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
https://quilt1m.github.io/
MIT License
129 stars 8 forks source link

Superflous images? #23

Open maubreville opened 6 months ago

maubreville commented 6 months ago

Dear authors,

thanks so much for providing this resource! It seems to me that the following 4 files have no metadata (in quilt_1M_lookup.csv). Is this possible?

_b_M_sOb4ZI_image_0760643c-923b-4f1e-a5e4-8b2f9b3f2849.jpg
uytytgxGP2Y_image_1c51efef-1301-4f83-ad35-bbf92fb6f90a.jpg
7M7Ol5StU7U_image_b61a7317-b9b7-4d66-9158-828ba75bfb27.jpg
7M7Ol5StU7U_image_84954e04-5f71-46cd-aa20-8595596e4649.jpg

If the error is on my side I apologize for it, but using the data in my dataloader it complained that there were files without metadata, so I thought I'd give you the feedback.

Best, Marc

wisdomikezogwo commented 5 months ago

My apologies, to confirm, these images are in the image folder (unzipped) but not in the lookup csv. Thanks for letting us know, I believe this may be due to certain duplicate images not deleting in the file system during post-processing, even after being deleted from the table. Please ignore this for now, and I will look into this to see if that is the case and if it is re-zip the files to make sure it does not affect others.