huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.14k stars 5.38k forks source link

Error Mapping on sd3, sdxl and upcoming flux controlnet training scripts #9523

Closed Night1099 closed 1 month ago

Night1099 commented 1 month ago

Describe the bug

Get to the mapping step and it will freeze at random points for each script

Map:   6%|██████                                                                                                   | 8000/138120 [19:27<5:16:36,  6.85 examples/s]
Traceback (most recent call last):
  File "/workspace/diffusers/examples/controlnet/train_controlnet_sd3.py", line 1416, in <module>
    main(args)
  File "/workspace/diffusers/examples/controlnet/train_controlnet_sd3.py", line 1132, in main
    train_dataset = train_dataset.map(compute_embeddings_fn, batched=True, new_fingerprint=new_fingerprint)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/datasets/arrow_dataset.py", line 560, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/datasets/arrow_dataset.py", line 3035, in map
    for rank, done, content in Dataset._map_single(**dataset_kwargs):
  File "/usr/local/lib/python3.11/dist-packages/datasets/arrow_dataset.py", line 3461, in _map_single
    writer.write_batch(batch)
  File "/usr/local/lib/python3.11/dist-packages/datasets/arrow_writer.py", line 567, in write_batch
    self.write_table(pa_table, writer_batch_size)
  File "/usr/local/lib/python3.11/dist-packages/datasets/arrow_writer.py", line 579, in write_table
    pa_table = pa_table.combine_chunks()
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/table.pxi", line 4387, in pyarrow.lib.Table.combine_chunks
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: offset overflow while concatenating arrays
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 1174, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 769, in simple_launcher

Reproduction

Train on sd3 controlnet. This dataset is working fine on sd1.5 controlnet training

Logs

No response

System Info

Ive tried python 2.4.0 , 2.4.1, and 2.3.1. Training on A100

Who can help?

No response

asomoza commented 1 month ago

cc: @DavyMorgan

Also this seems more an issue with the datasets library than diffusers.

Night1099 commented 1 month ago

Sorry i agree, ill move this over thanks