rom1504 / img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
MIT License
3.71k stars 338 forks source link

Unable to use img2dataset to download laion-high-resolution without install chardet #291

Closed Shamik-07 closed 1 year ago

Shamik-07 commented 1 year ago

As per the README.md, i installed img2daataset by pip install img2dataset, which install v1.41.0 and tried to download the laion-high-resolution dataset with the following command from the examples:

img2dataset --url_list laion-high-resolution --input_format "parquet"\
         --url_col "URL" --caption_col "TEXT" --output_format webdataset\
           --output_folder laion-high-resolution-output --processes_count 16 --thread_count 64 --image_size 1024\
            --resize_only_if_bigger=True --resize_mode="keep_ratio" --skip_reencode=True \
             --save_additional_columns '["similarity","hash","punsafe","pwatermark","LANGUAGE"]' --enable_wandb True

I got a missing import error for chardet.

Resolution, to include the chardet to the requirements.txt.

rom1504 commented 1 year ago

chardet is not directly imported by img2dataset https://github.com/search?q=repo%3Arom1504%2Fimg2dataset%20chardet&type=code

which dep included it ?

Shamik-07 commented 1 year ago

chardet is not directly imported by img2dataset https://github.com/search?q=repo%3Arom1504%2Fimg2dataset%20chardet&type=code

which dep included it ?

I don't know which dep needed it however, are you being able to recreate the missing chardet import error by running the command that i highlighted ?

rom1504 commented 1 year ago

no I never saw this error. Can you share the full stack trace ?

Shamik-07 commented 1 year ago

i will try and recreate this error.

Shamik-07 commented 1 year ago

ok, i can't recreate the error and it's running without any problem. apologies for creating this PR. thank you for looking into it.