rom1504 / img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
MIT License
3.74k stars 341 forks source link

pyarrow.lib.ArrowInvalid: Empty CSV file #393

Open R-Sheldon opened 10 months ago

R-Sheldon commented 10 months ago

when i download images with img2dataset, have a error: Starting the downloading of this file Sharding file number 1 of 1 called cc3m.tsv 0it [00:04, ?it/s] Traceback (most recent call last): File "test.py", line 11, in download( File "/usr/local/lib/python3.8/dist-packages/img2dataset/main.py", line 262, in download distributor_fn( File "/usr/local/lib/python3.8/dist-packages/img2dataset/distributor.py", line 36, in multiprocessing_distributor failed_shards = run(reader) File "/usr/local/lib/python3.8/dist-packages/img2dataset/distributor.py", line 31, in run for status, row in tqdm(process_pool.imap_unordered(downloader, gen)): File "/usr/local/lib/python3.8/dist-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next raise value pyarrow.lib.ArrowInvalid: Empty CSV file

Anyone know how to solve it? Please help me .