Get to the mapping step and it will freeze at random points for each script
Map: 6%|██████ | 8000/138120 [19:27<5:16:36, 6.85 examples/s]
Traceback (most recent call last):
File "/workspace/diffusers/examples/controlnet/train_controlnet_sd3.py", line 1416, in <module>
main(args)
File "/workspace/diffusers/examples/controlnet/train_controlnet_sd3.py", line 1132, in main
train_dataset = train_dataset.map(compute_embeddings_fn, batched=True, new_fingerprint=new_fingerprint)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/datasets/arrow_dataset.py", line 560, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/datasets/arrow_dataset.py", line 3035, in map
for rank, done, content in Dataset._map_single(**dataset_kwargs):
File "/usr/local/lib/python3.11/dist-packages/datasets/arrow_dataset.py", line 3461, in _map_single
writer.write_batch(batch)
File "/usr/local/lib/python3.11/dist-packages/datasets/arrow_writer.py", line 567, in write_batch
self.write_table(pa_table, writer_batch_size)
File "/usr/local/lib/python3.11/dist-packages/datasets/arrow_writer.py", line 579, in write_table
pa_table = pa_table.combine_chunks()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/table.pxi", line 4387, in pyarrow.lib.Table.combine_chunks
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: offset overflow while concatenating arrays
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 1174, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 769, in simple_launcher
Reproduction
Train on sd3 controlnet. This dataset is working fine on sd1.5 controlnet training
Logs
No response
System Info
Ive tried python 2.4.0 , 2.4.1, and 2.3.1. Training on A100
Describe the bug
Get to the mapping step and it will freeze at random points for each script
Reproduction
Train on sd3 controlnet. This dataset is working fine on sd1.5 controlnet training
Logs
No response
System Info
Ive tried python 2.4.0 , 2.4.1, and 2.3.1. Training on A100
Who can help?
No response