rom1504 / img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
MIT License
3.62k stars 336 forks source link

Why I can't download laion400M dataset? #407

Open SomnusQue opened 7 months ago

SomnusQue commented 7 months ago

I have been download meta-data. Is there any problems else? image this is my code: img2dataset --url_list laion400m-meta --input_format "parquet" --url_col "URL" --caption_col "TEXT" --output_format webdataset --output_folder laion400m-data --processes_count 16 --thread_count 128 --image_size 256 --save_additional_columns '["NSFW","similarity","LICENSE"]' --enable_wandb True

SomnusQue commented 7 months ago

Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "C:\Users\37620\miniconda3\Scripts\img2dataset.exe__main.py", line 7, in File "C:\Users\37620\miniconda3\Lib\site-packages\img2dataset\main.py", line 270, in main fire.Fire(download) File "C:\Users\37620\miniconda3\Lib\site-packages\fire\core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\37620\miniconda3\Lib\site-packages\fire\core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\37620\miniconda3\Lib\site-packages\fire\core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\37620\miniconda3\Lib\site-packages\img2dataset\main.py", line 179, in download reader = Reader( ^^^^^^^ File "C:\Users\37620\miniconda3\Lib\site-packages\img2dataset\reader.py", line 68, in init__ self.column_list = self.column_list + ["caption"]


TypeError: can only concatenate str (not "list") to str
wandb: (1) Private W&B dashboard, no account required
wandb: (2) Create a W&B account
wandb: (3) Use an existing W&B account
wandb: (4) Don't visualize my results
wandb: Enter your choice: Traceback (most recent call last):
  File "C:\Users\37620\miniconda3\Lib\site-packages\wandb\sdk\wandb_init.py", line 1172, in init
    wi.setup(kwargs)
  File "C:\Users\37620\miniconda3\Lib\site-packages\wandb\sdk\wandb_init.py", line 306, in setup
    wandb_login._login(
  File "C:\Users\37620\miniconda3\Lib\site-packages\wandb\sdk\wandb_login.py", line 317, in _login
    wlogin.prompt_api_key()
  File "C:\Users\37620\miniconda3\Lib\site-packages\wandb\sdk\wandb_login.py", line 240, in prompt_api_key
    key, status = self._prompt_api_key()
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\37620\miniconda3\Lib\site-packages\wandb\sdk\wandb_login.py", line 220, in _prompt_api_key
    key = apikey.prompt_api_key(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\37620\miniconda3\Lib\site-packages\wandb\sdk\lib\apikey.py", line 114, in prompt_api_key
    result = prompt_choices(
             ^^^^^^^^^^^^^^^
  File "C:\Users\37620\miniconda3\Lib\site-packages\wandb\util.py", line 1265, in prompt_choices
    choice = _prompt_choice(input_timeout=input_timeout, jupyter=jupyter)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\37620\miniconda3\Lib\site-packages\wandb\util.py", line 1248, in _prompt_choice
    choice = input_fn(text)
             ^^^^^^^^^^^^^^
EOFError: EOF when reading a line
Process LoggerProcess-1:
Traceback (most recent call last):
  File "C:\Users\37620\miniconda3\Lib\site-packages\wandb\sdk\wandb_init.py", line 1172, in init
    wi.setup(kwargs)
  File "C:\Users\37620\miniconda3\Lib\site-packages\wandb\sdk\wandb_init.py", line 306, in setup
    wandb_login._login(
  File "C:\Users\37620\miniconda3\Lib\site-packages\wandb\sdk\wandb_login.py", line 317, in _login
    wlogin.prompt_api_key()
  File "C:\Users\37620\miniconda3\Lib\site-packages\wandb\sdk\wandb_login.py", line 240, in prompt_api_key
    key, status = self._prompt_api_key()
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\37620\miniconda3\Lib\site-packages\wandb\sdk\wandb_login.py", line 220, in _prompt_api_key
    key = apikey.prompt_api_key(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\37620\miniconda3\Lib\site-packages\wandb\sdk\lib\apikey.py", line 114, in prompt_api_key
    result = prompt_choices(
             ^^^^^^^^^^^^^^^
  File "C:\Users\37620\miniconda3\Lib\site-packages\wandb\util.py", line 1265, in prompt_choices
    choice = _prompt_choice(input_timeout=input_timeout, jupyter=jupyter)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\37620\miniconda3\Lib\site-packages\wandb\util.py", line 1248, in _prompt_choice
    choice = input_fn(text)
             ^^^^^^^^^^^^^^
EOFError: EOF when reading a line

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\37620\miniconda3\Lib\multiprocessing\process.py", line 314, in _bootstrap
    self.run()
  File "C:\Users\37620\miniconda3\Lib\site-packages\img2dataset\logger.py", line 217, in run
    self.current_run = wandb.init(project=self.wandb_project, config=self.config_parameters, anonymous="allow")
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\37620\miniconda3\Lib\site-packages\wandb\sdk\wandb_init.py", line 1214, in init
    raise Error("An unexpected error occurred") from error_seen
wandb.errors.Error: An unexpected error occurred
tchaton commented 7 months ago

Hey @SomnusQue Have a look at this: https://lightning.ai/lightning-ai/studios/download-stream-400m-images-text~01hg0zg8fyybp7p1sma6g9dkzm

rom1504 commented 7 months ago

That's a windows problem with escaping the columns

Try another terminal