leonardoaraujosantos / ChangeNet

Implementation of the ChangeNet paper
76 stars 22 forks source link

RunTimeError when training the model #9

Open SergTar opened 4 years ago

SergTar commented 4 years ago

Hi, @leonardoaraujosantos

I have tried running your code and stuck into the issue with RunTimeError:

Epoch 0/49

Traceback (most recent call last): File "train_changenet.py", line 90, in bestmodel, = utils_train.train_model(change_net, dataloaders_dict, criterion, optimizer, sc_plt, None, device, num_epochs=num_epochs) File "/home/files/ChangeNet/utils_train.py", line 34, in train_model for sample in dataloaders[phase]: File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data return self._process_data(data) File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data data.reraise() File "/opt/conda/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 74, in default_collate return {key: default_collate([d[key] for d in batch]) for key in elem} File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 74, in return {key: default_collate([d[key] for d in batch]) for key in elem} File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate return torch.stack(batch, 0, out=out) RuntimeError: stack expects each tensor to be equal size, but got [3, 224, 224] at entry 0 and [4, 224, 224] at entry 9

It seems that then generating the dataset, there is a problem with the tensor dimension. Did you have this issue?

leonardoaraujosantos commented 4 years ago

No but I suspect it could be an issue with the dataLoader, what's your batch_size and your dataset size? I think you could try the option drop_last on the data loader.

JeanChillet commented 3 years ago

Hello, I have the same problem and I don't think it is linked to the batch size. Some of the label images are in RGBA format whereas some others are in RGB. It looks like the problem comes from the pickle files but I am not able to solve it. Would it be possible to have you input on this issue please?