justinpinkney / stable-diffusion

MIT License
1.45k stars 266 forks source link

RuntimeError: stack expects each tensor to be equal size ... #48

Open elvisace opened 1 year ago

elvisace commented 1 year ago

Running into this error RuntimeError: stack expects each tensor to be equal size... not exactly sure what the issue is. Obviously it's the data, but I thought that the images are transformed (resized and cropped) when the data is prepared, prior to being trained. Could it be that the dataset has a mix of jpegs and pngs?

Here are the error logs:

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1045, in _run_train
    self.fit_loop.run()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 200, in advance
    epoch_output = self.epoch_loop.run(train_dataloader)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 118, in advance
    _, (batch, is_last) = next(dataloader_iter)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/profiler/base.py", line 104, in profile_iterable
    value = next(iterator)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/supporters.py", line 629, in prefetch_iterator
    for val in it:
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/supporters.py", line 546, in __next__
    return self.request_next_batch(self.loader_iters)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/supporters.py", line 574, in request_next_batch
    return apply_to_collection(loader_iters, Iterator, next_fn)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 96, in apply_to_collection
    return function(data, *args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/supporters.py", line 561, in next_fn
    batch = next(iterator)
  File "/usr/lib/python3/dist-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/usr/lib/python3/dist-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
    return self._process_data(data)
  File "/usr/lib/python3/dist-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
    data.reraise()
  File "/usr/lib/python3/dist-packages/torch/_utils.py", line 461, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 2.
Original Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/lib/python3/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/usr/lib/python3/dist-packages/torch/utils/data/_utils/collate.py", line 160, in default_collate
    return elem_type({key: default_collate([d[key] for d in batch]) for key in elem})
  File "/usr/lib/python3/dist-packages/torch/utils/data/_utils/collate.py", line 160, in <dictcomp>
    return elem_type({key: default_collate([d[key] for d in batch]) for key in elem})
  File "/usr/lib/python3/dist-packages/torch/utils/data/_utils/collate.py", line 141, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [512, 512, 3] at entry 0 and [512, 512, 4] at entry 2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 901, in <module>
    trainer.fit(model, data)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 553, in fit
    self._run(model)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 918, in _run
    self._dispatch()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 986, in _dispatch
    self.accelerator.start_training(self)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training
    self._results = trainer.run_stage()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 996, in run_stage
    return self._run_train()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1058, in _run_train
    self.training_type_plugin.reconciliate_processes(traceback.format_exc())
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 453, in reconciliate_processes
    raise DeadlockDetectedException(f"DeadLock detected from rank: {self.global_rank} \n {trace}")
pytorch_lightning.utilities.exceptions.DeadlockDetectedException: DeadLock detected from rank: 0 
 Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1045, in _run_train
    self.fit_loop.run()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 200, in advance
    epoch_output = self.epoch_loop.run(train_dataloader)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 118, in advance
    _, (batch, is_last) = next(dataloader_iter)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/profiler/base.py", line 104, in profile_iterable
    value = next(iterator)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/supporters.py", line 629, in prefetch_iterator
    for val in it:
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/supporters.py", line 546, in __next__
    return self.request_next_batch(self.loader_iters)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/supporters.py", line 574, in request_next_batch
    return apply_to_collection(loader_iters, Iterator, next_fn)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 96, in apply_to_collection
    return function(data, *args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/supporters.py", line 561, in next_fn
    batch = next(iterator)
  File "/usr/lib/python3/dist-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/usr/lib/python3/dist-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
    return self._process_data(data)
  File "/usr/lib/python3/dist-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
    data.reraise()
  File "/usr/lib/python3/dist-packages/torch/_utils.py", line 461, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 2.
Original Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/lib/python3/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/usr/lib/python3/dist-packages/torch/utils/data/_utils/collate.py", line 160, in default_collate
    return elem_type({key: default_collate([d[key] for d in batch]) for key in elem})
  File "/usr/lib/python3/dist-packages/torch/utils/data/_utils/collate.py", line 160, in <dictcomp>
    return elem_type({key: default_collate([d[key] for d in batch]) for key in elem})
  File "/usr/lib/python3/dist-packages/torch/utils/data/_utils/collate.py", line 141, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [512, 512, 3] at entry 0 and [512, 512, 4] at entry 2
Bryanoxx commented 1 year ago

Hello, I also have the same error, with all my images being 512x512.

I tried to remove all torchvision.transforms in the YAML config file, but then I have a new error (TypeError: pic should be PIL Image or ndarray. Got <class 'torch.Tensor'>).

I'm not sure but I suppose it's linked to the BATCH_SIZE variable, because the only change compared to text-to-pokemon in my Notebook is the number of GPUs that is "1", I also tried different BATCH_SIZE but I got different errors.