rom1504 / img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
MIT License
3.42k stars 322 forks source link

placekitten.com example in README fails to download images #415

Open johnbradley opened 3 months ago

johnbradley commented 3 months ago

The Usage example in README fails to download files from placekitten with a 500 error. This seems to be a problem with placekitten and not img2dataset.

Example terminal session:

$ echo 'https://placekitten.com/200/305' >> myimglist.txt
$ echo 'https://placekitten.com/200/304' >> myimglist.txt
$ echo 'https://placekitten.com/200/303' >> myimglist.txt
$ img2dataset --url_list=myimglist.txt --output_folder=output_folder --thread_count=64 --image_size=256
Starting the downloading of this file
Sharding file number 1 of 1 called /Users/jpb67/Documents/work/img2dataset/sv/myimglist.txt
0it [00:00, ?it/s]File sharded in 1 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
1it [00:01,  1.67s/it]
worker  - success: 0.000 - failed to download: 1.000 - failed to resize: 0.000 - images per sec: 17 - count: 3
total   - success: 0.000 - failed to download: 1.000 - failed to resize: 0.000 - images per sec: 17 - count: 3

The output folder shows a 500 error.

 cat output_folder/00000_stats.json
{
    "count": 3,
    "successes": 0,
    "failed_to_download": 3,
    "failed_to_resize": 0,
    "duration": 0.17735910415649414,
    "start_time": 1711191984.052072,
    "end_time": 1711191984.2294312,
    "status_dict": {
        "HTTP Error 500: Internal Server Error": 3
    }
}

Browsing to https://placekitten.com/200/305 I get the following error:

placekitten-500-error

I didn't see a way to report this problem to placekitten.

I swapped placekitten.com for picsum.photos and everything ran as expected. I could create a PR with a fix to make this change to the README if you like.

rom1504 commented 3 months ago

Sure feel free to PR the change

On Sat, Mar 23, 2024, 12:19 PM John Bradley @.***> wrote:

The Usage example in README https://github.com/rom1504/img2dataset/blob/main/README.md#usage fails to download files from placekitten with a 500 error. This seems to be a problem with placekitten and not img2dataset.

Example terminal session:

$ echo 'https://placekitten.com/200/305' >> myimglist.txt $ echo 'https://placekitten.com/200/304' >> myimglist.txt $ echo 'https://placekitten.com/200/303' >> myimglist.txt $ img2dataset --url_list=myimglist.txt --output_folder=output_folder --thread_count=64 --image_size=256 Starting the downloading of this file Sharding file number 1 of 1 called /Users/jpb67/Documents/work/img2dataset/sv/myimglist.txt 0it [00:00, ?it/s]File sharded in 1 shards Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)! 1it [00:01, 1.67s/it] worker - success: 0.000 - failed to download: 1.000 - failed to resize: 0.000 - images per sec: 17 - count: 3 total - success: 0.000 - failed to download: 1.000 - failed to resize: 0.000 - images per sec: 17 - count: 3

The output folder shows a 500 error.

cat output_folder/00000_stats.json { "count": 3, "successes": 0, "failed_to_download": 3, "failed_to_resize": 0, "duration": 0.17735910415649414, "start_time": 1711191984.052072, "end_time": 1711191984.2294312, "status_dict": { "HTTP Error 500: Internal Server Error": 3 } }

Browsing to https://placekitten.com/200/305 I get the following error: placekitten-500-error.png (view on web) https://github.com/rom1504/img2dataset/assets/1024463/d4b740f9-c5dc-4ba6-be5d-ef094ba68df4

I didn't see a way to report this problem to placekitten.

I swapped placekitten.com for picsum.photos and everything ran as expected. I could create a PR with a fix to make this change to the README if you like.

— Reply to this email directly, view it on GitHub https://github.com/rom1504/img2dataset/issues/415, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437VOFEAI3W5Z5AXAWBDYZVQLPAVCNFSM6AAAAABFESLY76VHI2DSMVQWIX3LMV43ASLTON2WKOZSGIYDGOBTGE3TSNQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>