rom1504 / img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
MIT License
3.62k stars 336 forks source link

Implement mode to retry failed urls of all shards #395

Open rom1504 opened 8 months ago

rom1504 commented 8 months ago
  1. read parquet files in the output folder
  2. Keep the same partitioning as the output (that means non uniform shards)
  3. Write to the existing shards

Maybe avoid retrying some error statuses

sarvghotra commented 3 months ago

Any update on this? TIA!