rom1504 / img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
MIT License
3.42k stars 322 forks source link

Decompressing the downloaded tar file is very slow #421

Open Nastu-Ho opened 2 months ago

Nastu-Ho commented 2 months ago

Could you please provide a method to quickly decompress a large number of tar files?

rom1504 commented 2 months ago

The point of downloading as tar files is to use that directly during training with webdataset lib

On Wed, Apr 17, 2024, 2:45 PM Nastu Ho @.***> wrote:

Could you please provide a method to quickly decompress a large number of tar files?

— Reply to this email directly, view it on GitHub https://github.com/rom1504/img2dataset/issues/421, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437UUVVTFAFCI73JYCCTY5ZVHDAVCNFSM6AAAAABGLH5EYKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI2DQMRQHA3TENI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jiamingzhang94 commented 5 days ago

Hi @rom1504

I searched the entire repository but couldn't find any examples on how to use these tar files, such as the CC3M dataset. I tried many different methods from the WebDataset library, but they all resulted in errors. Why can't there be an example of how to use these datasets provided with each dataset download?