mustass / diffusion_models_for_speech

Deep Learning course project repository.
https://kurser.dtu.dk/course/02456
1 stars 0 forks source link

Conditional training with ZeroPad fails #26

Closed mustass closed 1 year ago

mustass commented 1 year ago

Not very easy to spot as it runs fine for some time and then crashes when the input is short, I suppose.

Traceback (most recent call last):
  File "/zhome/ef/8/160495/DL2022/diffusion_for_speech/./scripts/train.py", line 82, in run_model
    run(cfg)
  File "/zhome/ef/8/160495/DL2022/diffusion_for_speech/./scripts/train.py", line 56, in run
    trainer.fit(model, dm)
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 768, in fit
    self._call_and_handle_interrupt(
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 721, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1234, in _run
    results = self._run_stage()
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1321, in _run_stage
    return self._run_train()
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1351, in _run_train
    self.fit_loop.run()
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 269, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 171, in advance
    batch = next(data_fetcher)
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/pytorch_lightning/utilities/fetching.py", line 184, in __next__
    return self.fetching_function()
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/pytorch_lightning/utilities/fetching.py", line 259, in fetching_function
    self._fetch_next_batch(self.dataloader_iter)
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/pytorch_lightning/utilities/fetching.py", line 273, in _fetch_next_batch
    batch = next(iterator)
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/supporters.py", line 553, in __next__
    return self.request_next_batch(self.loader_iters)
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/supporters.py", line 565, in request_next_batch
    return apply_to_collection(loader_iters, Iterator, next)
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/pytorch_lightning/utilities/apply_func.py", line 99, in apply_to_collection
    return function(data, *args, **kwargs)
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 628, in __next__
    data = self._next_data()
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
    return self._process_data(data)
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
    data.reraise()
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/torch/_utils.py", line 543, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/zhome/ef/8/160495/DL2022/venv/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 61, in fetch
    return self.collate_fn(data)
  File "/zhome/ef/8/160495/DL2022/diffusion_for_speech/src/diffspeak/datasets/collator.py", line 43, in collate
    return self.assamble(minibatch)
  File "/zhome/ef/8/160495/DL2022/diffusion_for_speech/src/diffspeak/datasets/collator.py", line 26, in assamble
    spectrogram = torch.stack(
RuntimeError: stack expects each tensor to be equal size, but got [80, 80] at entry 0 and [68, 92] at entry 3
mustass commented 1 year ago

Before this it was throwing problems with this randint that I quick-fixed.

https://github.com/mustass/diffusion_for_speech/commit/6389254d6d3a672b5cc32808877157235daa1b45

But still, the error came up somewhere else because the tensors were of different lengths