Open rom1504 opened 1 year ago
(I downloaded without it, and no warning when using the dataset)
I'm considering to make --image_size 384 --resize_mode "keep_ratio" --resize_only_if_bigger True
the new default and recommend that in examples
Example of broken images from Amy Roberts (HF):
Here’s a selection of images which have this issue. Some of them are corrupted on the URL e.g. https://www.trip-blog.net/wp-content/uploads/2013/04/scottish-walking-group-40242_300x250.jpg and some seem to have had issues upon saving e.g. https://www.oggi-in-tv.it/images/chernobyl-the-last-battle-of-the-ussr.jpg. None were resized
/fsx/phenaki/coyo-700m/coyo-data-2/22725.tar { "clip_similarity_vitl14": 0.255615234375, "image_phash": "dbca8811a73e34b6", "num_faces": 0, "watermark_score": 0.052227362990379333, "aesthetic_score_laion_v2": 5.197615623474121, "caption": "5 Things to do in Andorra", "url": "https://www.trip-blog.net/wp-content/uploads/2015/04/5-Things-to-do-in-Andorra-79384_300x250.jpg", "key": "227254079", "status": "success", "error_message": null, "width": 300, "height": 250, "original_width": 300, "original_height": 250, "exif": "{}", "md5": "46884a7d6507b1b5a595f7d28df93d96" } /fsx/phenaki/coyo-700m/coyo-data-2/26550.tar { "clip_similarity_vitl14": 0.287841796875, "image_phash": "9be67830b621e4bc", "num_faces": 2, "watermark_score": 0.213081955909729, "aesthetic_score_laion_v2": 4.816852569580078, "caption": "Avengers marvel now - Tome 2", "url": "https://servimg.eyrolles.com/static/media/4018/9782809444018_internet_w290.jpg", "key": "265502045", "status": "success", "error_message": null, "width": 250, "height": 373, "original_width": 250, "original_height": 373, "exif": "{}", "md5": "df5360c45cb14c79c45787cab5f3842a" } /fsx/phenaki/coyo-700m/coyo-data-2/69277.tar { "clip_similarity_vitl14": 0.2152099609375, "image_phash": "9ebe411ba4710ee5", "num_faces": 0, "watermark_score": 0.018437210470438004, "aesthetic_score_laion_v2": 5.356045246124268, "caption": "Mhc Quiet Deluxe Suite Near Downtown", "url": "https://cdn.quick-sell.ro/b643e1291c2690d7c05efd50c12ba422/https%3A%2F%2Fcdn.tourismcloudservice.com%2FHotelsV3%2F637474%2F20202281223576.jpg", "key": "692770902", "status": "success", "error_message": null, "width": 500, "height": 375, "original_width": 500, "original_height": 375, "exif": "{}", "md5": "6374114e77b6f62090b4a6667eec81a0" } /fsx/phenaki/coyo-700m/coyo-data-2/63829.tar { "clip_similarity_vitl14": 0.26025390625, "image_phash": "b2861d64e4c9ed71", "num_faces": 0, "watermark_score": 0.0002586680056992918, "aesthetic_score_laion_v2": 5.084773063659668, "caption": "Microtel Inn & Suites Norcross", "url": "https://cdn.travitude.co.uk/979eb34519a70d2737ad9c78c43c98d9/https%3A%2F%2Fwww.hotelbeds.com%2Fgiata%2F29%2F295798%2F295798a_hb_w_009.jpg", "key": "638290673", "status": "success", "error_message": null, "width": 320, "height": 213, "original_width": 320, "original_height": 213, "exif": "{}", "md5": "7249fa194ac7845708d8c47b09293919" } /fsx/phenaki/coyo-700m/coyo-data-2/27317.tar { "clip_similarity_vitl14": 0.1729736328125, "image_phash": "906f304fa0b74f74", "num_faces": 0, "watermark_score": 0.05018091946840286, "aesthetic_score_laion_v2": 4.531515598297119, "caption": "Zentral Center (Adults Only 14+", "url": "https://cdn.quick-sell.ro/9b3719ca3493821464b58c756806f554/https%3A%2F%2Frezervari.paralela45.ro%2Fimg_of%2FH6077_C0_13864.jpg", "key": "273171134", "status": "success", "error_message": null, "width": 500, "height": 375, "original_width": 500, "original_height": 375, "exif": "{}", "md5": "0a85abd5d9ddb2d1d772ff4209c717e2" } /fsx/phenaki/coyo-700m/coyo-data-2/15145.tar { "clip_similarity_vitl14": 0.321044921875, "image_phash": "aaa77d2faaac2098", "num_faces": 0, "watermark_score": 0.08868316560983658, "aesthetic_score_laion_v2": 4.429727077484131, "caption": "2016 - Chopard Watches - rubber (car tire) band", "url": "http://mediashow.ro/root.php?g2_view=core.DownloadItem&g2_itemId=447925&g2_serialNumber=2&?rndm=bbnc", "key": "151454454", "status": "success", "error_message": null, "width": 200, "height": 267, "original_width": 200, "original_height": 267, "exif": "{}", "md5": "e91512abc73248d1f22ef13b81b63b8d" } /fsx/phenaki/coyo-700m/coyo-data-2/71778.tar { "clip_similarity_vitl14": 0.142578125, "image_phash": "f1992d66466c3bc8", "num_faces": 0, "watermark_score": 0.02955714799463749, "aesthetic_score_laionv2": 5.469544410705566, "caption": "Must-see landscapes of the natural world (part 2", "url": "https://www.trip-blog.net/wp-content/uploads/2013/07/outback-04.ashx-131441_300x250.jpg", "key": "717781985", "status": "success", "error_message": null, "width": 300, "height": 250, "original_width": 300, "original_height": 250, "exif": "{}", "md5": "5ef96d2b18bfb712473af71ccdc187dc" } /fsx/phenaki/coyo-700m/coyo-data-2/05658.tar { "clip_similarity_vitl14": 0.16455078125, "image_phash": "a3b273bf4a43ac28", "num_faces": 0, "watermark_score": 0.0987565666437149, "aesthetic_score_laion_v2": 2.399261474609375, "caption": "Star 115 (25mm) TinkerTech Two Cutters", "url": "https://cdn.shopify.com/s/files/1/2416/9341/products/c6b56835-78ce-4ad0-83ec-8625dee04a4f_large.jpg?v=1514037268", "key": "056585674", "status": "success", "error_message": null, "width": 307, "height": 307, "original_width": 307, "original_height": 307, "exif": "{\"Image HostComputer\": \"imagery4\"}", "md5": "61975bd92e7ee4de3294cc79f6519ec2" } /fsx/phenaki/coyo-700m/coyo-data-2/58523.tar { "clip_similarity_vitl14": 0.2041015625, "image_phash": "ad69c925fad84496", "num_faces": 0, "watermark_score": 0.004144101869314909, "aesthetic_score_laion_v2": 5.134791374206543, "caption": "Mhc Quiet Deluxe Suite Near Downtown", "url": "https://cdn.quick-sell.ro/a29858d7914481a17ee6cb813b6343d4/https%3A%2F%2Fcdn.tourismcloudservice.com%2FHotelsV3%2F637474%2F2020228122355481.jpg", "key": "585231622", "status": "success", "error_message": null, "width": 500, "height": 375, "original_width": 500, "original_height": 375, "exif": "{}", "md5": "02ca09e353096dc2498ab243baf32af7" } /fsx/phenaki/coyo-700m/coyo-data-2/30136.tar { "clip_similarity_vitl14": 0.296142578125, "image_phash": "bfbfc24881e056a3", "num_faces": 0, "watermark_score": 0.050376392900943756, "aesthetic_score_laion_v2": 4.77297306060791, "caption": "Knoos Men Tan Lace-up Casual Shoes", "url": "https://cdn.shopclues.com/images1/thumbnails/100783/320/320/146418599-100783861-1559561618.jpg", "key": "301367748", "status": "success", "error_message": null, "width": 320, "height": 320, "original_width": 320, "original_height": 320, "exif": "{}", "md5": "e7e145a2c6fa8860f99b0d8298fb5661" } /fsx/phenaki/coyo-700m/coyo-data-2/24749.tar { "clip_similarity_vitl14": 0.180419921875, "image_phash": "8fb272cfe1887870", "num_faces": 0, "watermark_score": 5.02593356941361e-05, "aesthetic_score_laion_v2": 4.426836013793945, "caption": "M Central Apartments", "url": "https://cdn.quick-sell.ro/afef5b58d42d9180f63de90d3284625a/https%3A%2F%2Fcdn.tourismcloudservice.com%2FHotelsV3%2F639642%2F202036143838328.jpg", "key": "247499867", "status": "success", "error_message": null, "width": 282, "height": 500, "original_width": 282, "original_height": 500, "exif": "{}", "md5": "eea4cf40c2dba70009f2fdb009ad8766" } /fsx/phenaki/coyo-700m/coyo-data-2/43717.tar { "clip_similarity_vitl14": 0.28564453125, "image_phash": "8787870f07f1f0f0", "num_faces": 0, "watermark_score": 0.14687882363796234, "aesthetic_score_laion_v2": 4.11320161819458, "caption": "Hot sale golden handle color promotional gift ceramic coffee mugs", "url": "https://cdn.goodao.net/alikeso/%E5%BE%AE%E4%BF%A1%E5%9B%BE%E7%89%87_20190504150526-300x300.jpg", "key": "437175072", "status": "success", "error_message": null, "width": 300, "height": 300, "original_width": 300, "original_height": 300, "exif": "{}", "md5": "e6ef6e465c1552241ff150819d1c5318" } /fsx/phenaki/coyo-700m/coyo-data-2/25437.tar { "clip_similarity_vitl14": 0.2025146484375, "image_phash": "979bde478278a11a", "num_faces": 4, "watermark_score": 0.06762968748807907, "aesthetic_score_laion_v2": 4.8978352546691895, "caption": "Chernobyl: the last battle of the Ussr", "url": "https://www.oggi-in-tv.it/images/chernobyl-the-last-battle-of-the-ussr.jpg", "key": "254373346", "status": "success", "error_message": null, "width": 560, "height": 320, "original_width": 560, "original_height": 320, "exif": "{}", "md5": "2fdbf8bae60b893b082452e7bfa94a5d" } /fsx/phenaki/coyo-700m/coyo-data-2/61148.tar { "clip_similarity_vitl14": 0.239990234375, "image_phash": "8ff8f02e0707f0f0", "num_faces": 0, "watermark_score": 0.017646746709942818, "aesthetic_score_laion_v2": 5.162074565887451, "caption": "Moving Expenses and Tax Deductions in 2020", "url": "https://1ststepmovers.com/wp-content/uploads/2020/07/Moving-Expenses-and-Tax-Deductions-2020-1st-Step-Movers-2-500x383.jpg", "key": "611481971", "status": "success", "error_message": null, "width": 500, "height": 383, "original_width": 500, "original_height": 383, "exif": "{}", "md5": "665671542eb2353bbac25798044e7121" } /fsx/phenaki/coyo-700m/coyo-data-2/61803.tar { "clip_similarity_vitl14": 0.1695556640625, "image_phash": "8080d47a7f7f870b", "num_faces": 0, "watermark_score": 0.001754570985212922, "aesthetic_score_laion_v2": 4.825148105621338, "caption": "How to travel with friends for long periods of time (part 2", "url": "https://www.trip-blog.net/wp-content/uploads/2013/04/scottish-walking-group-40242_300x250.jpg", "key": "618036179", "status": "success", "error_message": null, "width": 300, "height": 250, "original_width": 300, "original_height": 250, "exif": "{}", "md5": "7cd0848585cc1d96cbb6dc8e6e1686c1" }
This is what is causing some image failures, confirmed @rwightman
maybe it's fixable