Closed sainatarajan closed 4 years ago
Something you may want to try is seeing if you can delete some variables as soon as they are done being used to clear up memory early. We unfortunately didn't try to run this with low-memory GPUs, so the optimizations in terms of memory are likely sub-optimal. In a future version (when I have more time), I can try to optimize the memory usage more.
Thank you for your reply. The model looks very complicated to play around with. It will be difficult to revert back any changes I make. If you could help me do this, it would be of great help. However, I have only 50 images for training and I think Google Colab should be able to handle this small load with 12 GB of GPU memory. Even I tried with 64*64 resolution and still it failed.
try reducing crop size. number of images has nothing to do with model. it just take small time to train if images are less. try using 16gb gpu it works on 16gb gpu.
@shubhaminnani Thank you. I had set the crop size to 360, the training went further ahead and stopped at this error. Can you tell me why? Here is the stack trace:
Traceback (most recent call last):
File "train.py", line 383, in <module>
main()
File "train.py", line 154, in main
train(train_loader, net, criterion, optim, epoch, writer)
File "train.py", line 233, in train
main_loss = net(inputs, gts=mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/content/gdrive/My Drive/gscnn/network/gscnn.py", line 327, in forward
return self.criterion((seg_out, edge_out), gts)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/content/gdrive/My Drive/gscnn/loss.py", line 161, in forward
return self.nll_loss(F.log_softmax(inputs), targets)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1314, in log_softmax
dim = _get_softmax_dim('log_softmax', input.dim(), _stacklevel)
AttributeError: 'tuple' object has no attribute 'dim'
Hi, thanks for the repo. However, I am not able to run my training on the cityscapes dataset. I have around 50 images for training and about 10 for validation and testing each. I have reduced the image resolution to 128x128 and still, it gives the CUDA out of memory error. I am running this on Google Colab which has 12 GB of GPU memory. Can you tell me what I should do to be able to run this model? Any changes that have to be tweaked? @shubhaminnani @tovacinni @varunjampani @davidjesusacu @ShreyasSkandanS