Training issue on num_segment=12

Fazlik995 commented 4 years ago

Hi, I successfully run your network on Somethin-v1 with num_segment=8

However, when I use num_segment=12, after 1st epoch, I am receiving the following error: RuntimeError: shape '[-1, 8, 4, 27, 27]' is invalid for input of size 34992

Any ideas?

swathikirans commented 4 years ago

Hi,

Please comment the script you used for training and the line where the error occurs. Also, is it occurring during validation or during training?

Fazlik995 commented 4 years ago

Script: python3 main.py something-v1 RGB --arch InceptionV3 --num_segments 12 --consensus_type avg --batch-size 16 --iter_size 2 --dropout 0.5 --lr 0.01 --warmup 10 --epochs 60 --eval-freq 5 --gd 20 --run_iter 1 -j 16 --npb --gsm

error: Epoch: [0][5360/5377], lr: 0.00100 Time 2.036 (0.944) Data 1.384 (0.325) Loss 2.3372 (3.6284) Prec@1 6.250 (3.454) Prec@5 6.250 (12.217) Traceback (most recent call last): File "main.py", line 385, in main() File "main.py", line 165, in main train_prec1 = train(train_loader, model, criterion, optimizer, epoch, log_training, writer=writer) File "main.py", line 220, in train output = model(input_var) File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, kwargs) File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply raise output File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker output = module(*input, *kwargs) File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, kwargs) File "/media/CVIP/GSM/models.py", line 194, in forward base_out = self.base_model(input.view((-1, sample_len) + input.size()[-2:])) File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, kwargs) File "/media/CVIP/GSM/model_zoo/bninception/pytorch_load.py", line 121, in forward data_dict[op[2]] = getattr(self, op[0])(data_dict[op[-1]]) File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, *kwargs) File "/media/CVIP/GSM/gsm.py", line 243, in forward x = self.cam1(x) File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, kwargs) File "/media/CVIP/GSM/gsm.py", line 70, in forward reshape_bottleneck = bottleneck.view((-1, self.n_segment) + bottleneck.size()[1:]) # n, t, c, h, w RuntimeError: shape '[-1, 8, 4, 27, 27]' is invalid for input of size 34992

I got an error after 1st epoch

Any ideas?

swathikirans commented 4 years ago

Hi,

It seems like the error is triggered in somewhere that is not part of the original implementation shared in this repo. I am unable to help in this case. I would suggest you to try adding "drop_last=True" in the data_loader definition.

swathikirans / GSM

Training issue on num_segment=12 #17